Re: [ARM] PR66791: Replace builtins in vshl_n

2021-07-23 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 22 Jul 2021 at 20:29, Richard Earnshaw
 wrote:
>
>
>
> On 22/07/2021 14:47, Prathamesh Kulkarni via Gcc-patches wrote:
> > On Thu, 22 Jul 2021 at 17:28, Richard Earnshaw
> >  wrote:
> >>
> >>
> >>
> >> On 22/07/2021 12:32, Prathamesh Kulkarni wrote:
> >>> On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
> >>>  wrote:
> 
> 
> 
>  On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
> > Hi,
> > The attached patch removes calls to builtins from vshl_n intrinsics,
> > and replacing them
> > with left shift operator. The patch passes bootstrap+test on
> > arm-linux-gnueabihf.
> >
> > Altho, I noticed, that the patch causes 3 extra registers to spill
> > using << instead
> > of the builtin for vshl_n.c. Could that be perhaps due to inlining of
> > intrinsics ?
> > Before patch, the shift operation was performed by call to
> > __builtin_neon_vshl (__a, __b)
> > and now it's inlined to __a << __b, which might result in increased
> > register pressure ?
> >
> > Thanks,
> > Prathamesh
> >
> 
> 
>  You're missing a ChangeLog for the patch.
> >>> Sorry, updated in this patch.
> 
>  However, I'm not sure about this.  The register shift form of VSHL
>  performs a right shift if the value is negative, which is UB if you
>  write `<<` instead.
> 
>  Have I missed something here?
> >>> Hi Richard,
> >>> According to this article:
> >>> https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
> >>> For vshl_n, the shift amount is always in the non-negative range for all 
> >>> types.
> >>>
> >>> I tried using vshl_n_s32 (a, -1), and the compiler emitted following 
> >>> diagnostic:
> >>> foo.c: In function ‘main’:
> >>> foo.c:17:1: error: constant -1 out of range 0 - 31
> >>>  17 | }
> >>> | ^
> >>>
> >>
> >> It does do that now, but that's because the intrinsic expansion does
> >> some bounds checking; when you remove the call into the back-end
> >> intrinsic that will no-longer happen.
> >>
> >> I think with this change various things are likely:
> >>
> >> - We'll no-longer reject non-immediate values, so users will be able to
> >> write
> >>
> >>   int b = 5;
> >>  vshl_n_s32 (a, b);
> >>
> >> which will expand to a vdup followed by the register form.
> >>
> >> - we'll rely on the front-end diagnosing out-of range shifts
> >>
> >> - code of the form
> >>
> >>  int b = -1;
> >>  vshl_n_s32 (a, b);
> >>
> >> will probably now go through without any errors, especially at low
> >> optimization levels.  It may end up doing what the user wanted, but it's
> >> definitely a change in behaviour - and perhaps worse, the compiler might
> >> diagnose the above as UB and silently throw some stuff away.
> >>
> >> It might be that we need to insert some form of static assertion that
> >> the second argument is a __builtin_constant_p().
> > Ah right, thanks for the suggestions!
> > I tried the above example:
> > int b = -1;
> > vshl_n_s32 (a, b);
> > and it compiled without any errors with -O0 after patch.
> >
> > Would it be OK to use _Static_assert (__builtin_constant_p (b)) to
> > guard against non-immediate values ?
> >
> > With the following change:
> > __extension__ extern __inline int32x2_t
> > __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> > vshl_n_s32 (int32x2_t __a, const int __b)
> > {
> >_Static_assert (__builtin_constant_p (__b));
> >return __a << __b;
> > }
> >
> > the above example fails at -O0:
> > ../armhf-build/gcc/include/arm_neon.h: In function ‘vshl_n_s32’:
> > ../armhf-build/gcc/include/arm_neon.h:4904:3: error: static assertion failed
> >   4904 |   _Static_assert (__builtin_constant_p (__b));
> >|   ^~
>
> I've been playing with that but unfortunately it doesn't seem to work in
> the way we want it to.  For a complete test:
>
>
>
> typedef __simd64_int32_t int32x2_t;
>
> __extension__ extern __inline int32x2_t
> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> vshl_n_s32 (int32x2_t __a, const int __b)
> {
>_Static_assert (__builtin_constant_p (__b), "Second argument must be
> a litteral constant");
>return __a << __b;
> }
>
> int32x2_t f (int32x2_t x, const int b)
> {
>return vshl_n_s32 (x, 1);
> }
>
> At -O0 I get:
>
> test.c: In function ‘vshl_n_s32’:
> test.c:7:3: error: static assertion failed: "Second argument must be a
> litteral constant"
>  7 |   _Static_assert (__builtin_constant_p (__b), "Second argument
> must be a litteral constant");
>|   ^~
>
> While at -O1 and above I get:
>
>
> test.c: In function ‘vshl_n_s32’:
> test.c:7:19: error: expression in static assertion is not constant
>  7 |   _Static_assert (__builtin_constant_p (__b), "Second argument
> must be a litteral constant");
>|   ^~
>
> Which indicates that it do

Re: [PATCH v2] gcov: Add __gcov_info_to_gdca()

2021-07-23 Thread Sebastian Huber

On 23/07/2021 08:52, Martin Liška wrote:

+#ifdef NEED_L_GCOV_INFO_TO_GCDA
+/* Convert the gcov info to a gcda data stream.  It is intended for
+   free-standing environments which do not support the C library 
file I/O.  */

+
+void
+__gcov_info_to_gcda (const struct gcov_info *gi_ptr,
+ void (*filename) (const char *, void *),


What about begin_finaname_fn?


+ void (*dump) (const void *, unsigned, void *),
+ void *(*allocate) (unsigned, void *),
+ void *arg)
+{
+  (*filename) (gi_ptr->filename, arg);
+  write_one_data (gi_ptr, NULL, dump, allocate, arg);
+}
+#endif /* NEED_L_GCOV_INFO_TO_GCDA */



About gcov_write_summary: it should be also dumped in order to have a 
complete .gcda file, right?


How can I get access to the summary information? Here it is not 
available:


You only need to change gcov_write_summary in gcov-io.c.


Sorry, I still don't know how I can get the summary information if I 
only have a pointer to the gcov_info structure which does not contain a 
summary member.


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH v2] gcov: Add __gcov_info_to_gdca()

2021-07-23 Thread Martin Liška

On 7/23/21 9:06 AM, Sebastian Huber wrote:

On 23/07/2021 08:52, Martin Liška wrote:

+#ifdef NEED_L_GCOV_INFO_TO_GCDA
+/* Convert the gcov info to a gcda data stream.  It is intended for
+   free-standing environments which do not support the C library file I/O.  */
+
+void
+__gcov_info_to_gcda (const struct gcov_info *gi_ptr,
+ void (*filename) (const char *, void *),


What about begin_finaname_fn?


+ void (*dump) (const void *, unsigned, void *),
+ void *(*allocate) (unsigned, void *),
+ void *arg)
+{
+  (*filename) (gi_ptr->filename, arg);
+  write_one_data (gi_ptr, NULL, dump, allocate, arg);
+}
+#endif /* NEED_L_GCOV_INFO_TO_GCDA */



About gcov_write_summary: it should be also dumped in order to have a complete 
.gcda file, right?


How can I get access to the summary information? Here it is not available:


You only need to change gcov_write_summary in gcov-io.c.


Sorry, I still don't know how I can get the summary information if I only have 
a pointer to the gcov_info structure which does not contain a summary member.


You're right, sorry! But in your case, it will be simple to re-created it by 
the script at a host system.


gcov_write_summary (gcov_unsigned_t tag, const struct gcov_summary *summary)
{
  gcov_write_tag_length (tag, GCOV_TAG_SUMMARY_LENGTH);
  gcov_write_unsigned (summary->runs);
  gcov_write_unsigned (summary->sum_max);
}

Where summary->runs will be 1 and sum_max is maximum counter during the run.

Cheers,
Martin


[PATCH] expmed: Fix store_integral_bit_field [PR101562]

2021-07-23 Thread Jakub Jelinek via Gcc-patches
Hi!

Our documentation says that paradoxical subregs shouldn't appear
in strict_low_part:
'(strict_low_part (subreg:M (reg:N R) 0))'
 This expression code is used in only one context: as the
 destination operand of a 'set' expression.  In addition, the
 operand of this expression must be a non-paradoxical 'subreg'
 expression.
but on the testcase below that triggers UB at runtime
store_integral_bit_field emits exactly that.

The following patch fixes it by ensuring the requirement is satisfied.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-23  Jakub Jelinek  

PR rtl-optimization/101562
* expmed.c (store_integral_bit_field): Only use movstrict_optab
if the operand isn't paradoxical.

* gcc.c-torture/compile/pr101562.c: New test.

--- gcc/expmed.c.jj 2021-03-04 19:38:00.0 +0100
+++ gcc/expmed.c2021-07-22 11:13:00.996420515 +0200
@@ -921,7 +921,10 @@ store_integral_bit_field (rtx op0, opt_s
}
 
   subreg_off = bitnum / BITS_PER_UNIT;
-  if (validate_subreg (fieldmode, GET_MODE (arg0), arg0, subreg_off))
+  if (validate_subreg (fieldmode, GET_MODE (arg0), arg0, subreg_off)
+ /* STRICT_LOW_PART must have a non-paradoxical subreg as
+operand.  */
+ && !paradoxical_subreg_p (fieldmode, GET_MODE (arg0)))
{
  arg0 = gen_rtx_SUBREG (fieldmode, arg0, subreg_off);
 
--- gcc/testsuite/gcc.c-torture/compile/pr101562.c.jj   2021-07-22 
11:22:55.745962043 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr101562.c  2021-07-22 
11:22:15.839529580 +0200
@@ -0,0 +1,21 @@
+/* PR rtl-optimization/101562 */
+
+struct S { char c; };
+void baz (struct S a, struct S b);
+
+void
+foo (void)
+{
+  struct S x[1];
+  *(short *)&x[0] = 256;
+  baz (x[0], x[1]);
+}
+
+void
+bar (void)
+{
+  struct S x[1];
+  x[0].c = 0;
+  x[1].c = 1;
+  baz (x[0], x[1]);
+}

Jakub



Re: [PATCH] PR fortran/101536 - ICE in gfc_conv_expr_descriptor, at fortran/trans-array.c:7324

2021-07-23 Thread Tobias Burnus

Hi Harald,

On 22.07.21 21:03, Harald Anlauf wrote:

you are right in that I was barking up the wrong tree.
I was focussed too much on the testcase in the PR.
[...]
Well, I tried and this does not work.


Which makes sense if one thinks about it:

When using 'a(5,:)', the parser already sets e->rank = 1.

while for 'a', the 'a' is the class wrapper with rank == 0 and
then overriding the e->rank by CLASS_DATA(e)->as.rank
+ adding AR_FULL makes sense.


However, an additional plain check on e->rank != 0 also in the
CLASS cases fixes the original issue as well as your example:

[...]

And regtests ok. :-)
See attached updated patch.


I think you still need to remove the 'return true;' from
the 'if (e->rank != 0 && e->ts.type == BT_CLASS' block – to
fall through to the e->rank check after the block.
(When 'return true;' is gone, the '{' and '}' can also be removed.)

Reason: Assume 'CLASS(...) x'. In this case, 'x' is a scalar.
And even after calling gfc_add_class_array_ref it remains
a scalar and e->rank == 0.

Or in other words: I think with your current patch,
class(u)  :: z
f = size (z)
is wrongly accepted without an error.

Thus: OK with a scalar CLASS entry added which gives an error,
which I believe requires the removal of the 'return true;' line.

Thanks for the patch – and I find it surprising how many
combinations exist which all can go wrong.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] c++: Accept C++11 attribute-definition [PR101582]

2021-07-23 Thread Jakub Jelinek via Gcc-patches
Hi!

As the following testcase shows, we don't parse properly
C++11 attribute-declaration:
https://eel.is/c++draft/dcl.dcl#nt:attribute-declaration

cp_parser_toplevel_declaration just handles empty-declaration parsing
(with diagnostics for C++98) and otherwise calls cp_parser_declaration
which on it calls cp_parser_simple_declaration and rejects it with
"does not declare anything" permerror.

The following patch instead handles it in cp_parser_toplevel_declaration
by parsing the attributes (standard ones only, we've never supported
__attribute__((...)); at namespace scope, so I'm not sure we need to
introduce that), which for C++98 emits the needed diagnostics, and then
warning if there are any attributes that we throw away on the floor.

I'll need this later for OpenMP directives at namespace scope, e.g.
[[omp::directive (requires, atomic_default_mem_order(seq_cst))]];
should be valid at namespace scope (and many other directives).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-23  Jakub Jelinek  

PR c++/101582
* parser.c (cp_parser_skip_std_attribute_spec_seq): Add a forward
declaration.
(cp_parser_toplevel_declaration): Parse attribute-declaration.

* g++.dg/cpp0x/gen-attrs-45.C: Expect a warning about ignored
attributes instead of error.
* g++.dg/cpp0x/gen-attrs-75.C: New test.

--- gcc/cp/parser.c.jj  2021-07-22 17:47:26.025761491 +0200
+++ gcc/cp/parser.c 2021-07-22 19:09:28.487513184 +0200
@@ -2507,6 +2507,8 @@ static tree cp_parser_std_attribute_spec
   (cp_parser *);
 static tree cp_parser_std_attribute_spec_seq
   (cp_parser *);
+static size_t cp_parser_skip_std_attribute_spec_seq
+  (cp_parser *, size_t);
 static size_t cp_parser_skip_attributes_opt
   (cp_parser *, size_t);
 static bool cp_parser_extension_opt
@@ -14547,6 +14549,20 @@ cp_parser_toplevel_declaration (cp_parse
   if (cxx_dialect < cxx11)
pedwarn (input_location, OPT_Wpedantic, "extra %<;%>");
 }
+  else if (cp_lexer_nth_token_is (parser->lexer,
+ cp_parser_skip_std_attribute_spec_seq (parser,
+1),
+ CPP_SEMICOLON))
+{
+  location_t attrs_loc = token->location;
+  tree std_attrs = cp_parser_std_attribute_spec_seq (parser);
+  if (std_attrs != NULL_TREE)
+   warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
+   OPT_Wattributes,
+   "attributes in attribute declaration are ignored");
+  if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
+   cp_lexer_consume_token (parser->lexer);
+}
   else
 /* Parse the declaration itself.  */
 cp_parser_declaration (parser, NULL_TREE);
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-45.C.jj2020-01-12 
11:54:37.072403466 +0100
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-45.C   2021-07-22 19:14:38.250222344 
+0200
@@ -1,4 +1,4 @@
 // PR c++/52906
 // { dg-do compile { target c++11 } }
 
-[[gnu::deprecated]]; // { dg-error "does not declare anything" }
+[[gnu::deprecated]]; // { dg-warning "attributes in attribute declaration are 
ignored" }
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-75.C.jj2021-07-22 
19:14:58.438942693 +0200
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-75.C   2021-07-22 19:12:18.442158972 
+0200
@@ -0,0 +1,8 @@
+// PR c++/101582
+// { dg-do compile }
+// { dg-options "" }
+
+;
+[[]] [[]] [[]];// { dg-warning "attributes only available with" "" { 
target c++98_only } }
+[[foobar]];// { dg-warning "attributes in attribute declaration are 
ignored" }
+// { dg-warning "attributes only available with" "" { target c++98_only } .-1 }

Jakub



[committed] openmp: Diagnose invalid mixing of the attribute and pragma syntax directives

2021-07-23 Thread Jakub Jelinek via Gcc-patches
Hi!

The OpenMP 5.1 spec says that the attribute and pragma syntax directives
should not be mixed on the same statement.  The following patch adds diagnostic
for that,
  [[omp::directive (...)]]
  #pragma omp ...
is always an error and for the other order
  #pragma omp ...
  [[omp::directive (...)]]
it depends on whether the pragma directive is an OpenMP construct
(then it is an error because it needs a structured block or loop
or statement as body) or e.g. a standalone directive (then it is fine).

Only block scope is handled for now though, namespace scope and class scope
still needs implementing even the basic support.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-07-23  Jakub Jelinek  

gcc/c-family/
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP__START_ and
PRAGMA_OMP__LAST_ enumerators.
gcc/cp/
* parser.h (struct cp_parser): Add omp_attrs_forbidden_p member.
* parser.c (cp_parser_handle_statement_omp_attributes): Diagnose
mixing of attribute and pragma syntax directives when seeing
omp::directive if parser->omp_attrs_forbidden_p or if attribute syntax
directives are followed by OpenMP pragma.
(cp_parser_statement): Clear parser->omp_attrs_forbidden_p after
the cp_parser_handle_statement_omp_attributes call.
(cp_parser_omp_structured_block): Add disallow_omp_attrs argument,
if true, set parser->omp_attrs_forbidden_p.
(cp_parser_omp_scan_loop_body, cp_parser_omp_sections_scope): Pass
false as disallow_omp_attrs to cp_parser_omp_structured_block.
(cp_parser_omp_parallel, cp_parser_omp_task): Set
parser->omp_attrs_forbidden_p.
gcc/testsuite/
* g++.dg/gomp/attrs-4.C: New test.
* g++.dg/gomp/attrs-5.C: New test.

--- gcc/c-family/c-pragma.h.jj  2021-07-22 12:37:20.409533286 +0200
+++ gcc/c-family/c-pragma.h 2021-07-22 12:44:57.903028283 +0200
@@ -42,7 +42,9 @@ enum pragma_kind {
   PRAGMA_OACC_UPDATE,
   PRAGMA_OACC_WAIT,
 
+  /* PRAGMA_OMP__START_ should be equal to the first PRAGMA_OMP_* code.  */
   PRAGMA_OMP_ALLOCATE,
+  PRAGMA_OMP__START_ = PRAGMA_OMP_ALLOCATE,
   PRAGMA_OMP_ATOMIC,
   PRAGMA_OMP_BARRIER,
   PRAGMA_OMP_CANCEL,
@@ -72,6 +74,8 @@ enum pragma_kind {
   PRAGMA_OMP_TASKYIELD,
   PRAGMA_OMP_THREADPRIVATE,
   PRAGMA_OMP_TEAMS,
+  /* PRAGMA_OMP__LAST_ should be equal to the last PRAGMA_OMP_* code.  */
+  PRAGMA_OMP__LAST_ = PRAGMA_OMP_TEAMS,
 
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
--- gcc/cp/parser.h.jj  2021-07-02 21:59:12.350171752 +0200
+++ gcc/cp/parser.h 2021-07-22 15:26:41.905013091 +0200
@@ -398,6 +398,9 @@ struct GTY(()) cp_parser {
  identifiers) rather than an explicit template parameter list.  */
   bool fully_implicit_function_template_p;
 
+  /* TRUE if omp::directive or omp::sequence attributes may not appear.  */
+  bool omp_attrs_forbidden_p;
+
   /* Tracks the function's template parameter list when declaring a function
  using generic type parameters.  This is either a new chain in the case of 
a
  fully implicit function template or an extension of the function's 
existing
--- gcc/cp/parser.c.jj  2021-07-22 12:37:20.445532774 +0200
+++ gcc/cp/parser.c 2021-07-22 17:47:26.025761491 +0200
@@ -11665,6 +11665,7 @@ cp_parser_handle_statement_omp_attribute
   auto_vec vec;
   int cnt = 0;
   int tokens = 0;
+  bool bad = false;
   for (tree *pa = &attrs; *pa; )
 if (get_attribute_namespace (*pa) == omp_identifier
&& is_attribute_p ("directive", get_attribute_name (*pa)))
@@ -11676,6 +11677,14 @@ cp_parser_handle_statement_omp_attribute
gcc_assert (TREE_CODE (d) == DEFERRED_PARSE);
cp_token *first = DEFPARSE_TOKENS (d)->first;
cp_token *last = DEFPARSE_TOKENS (d)->last;
+   if (parser->omp_attrs_forbidden_p)
+ {
+   error_at (first->location,
+ "mixing OpenMP directives with attribute and pragma "
+ "syntax on the same statement");
+   parser->omp_attrs_forbidden_p = false;
+   bad = true;
+ }
const char *directive[3] = {};
for (int i = 0; i < 3; i++)
  {
@@ -11731,6 +11740,9 @@ cp_parser_handle_statement_omp_attribute
 else
   pa = &TREE_CHAIN (*pa);
 
+  if (bad)
+return attrs;
+
   unsigned int i;
   cp_omp_attribute_data *v;
   cp_omp_attribute_data *construct_seen = nullptr;
@@ -11780,6 +11792,18 @@ cp_parser_handle_statement_omp_attribute
" can only appear on an empty statement");
   return attrs;
 }
+  if (cnt && cp_lexer_next_token_is (parser->lexer, CPP_PRAGMA))
+{
+  cp_token *token = cp_lexer_peek_token (parser->lexer);
+  enum pragma_kind kind = cp_parser_pragma_kind (token);
+  if (kind >= PRAGMA_OMP__START_ && kind <= PRAGMA_OMP__LAST_)
+   {
+ error_at (token->location,
+   "mixing OpenMP

[committed] openmp: Add support for __has_attribute(omp::directive) and __has_attribute(omp::sequence)

2021-07-23 Thread Jakub Jelinek via Gcc-patches
Hi!

Now that the C++ FE supports these attributes, but not through registering
them in the attributes tables (they work quite differently from other
attributes), this teaches c_common_has_attributes about those.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-07-23  Jakub Jelinek  

* c-lex.c (c_common_has_attribute): Call canonicalize_attr_name also
on attr_id.  Return 1 for omp::directive or omp::sequence in C++11
and later.

* c-c++-common/gomp/attrs-1.c: New test.
* c-c++-common/gomp/attrs-2.c: New test.
* c-c++-common/gomp/attrs-3.c: New test.

--- gcc/c-family/c-lex.c.jj 2021-05-21 10:34:09.046563955 +0200
+++ gcc/c-family/c-lex.c2021-07-22 15:16:37.340412532 +0200
@@ -338,7 +338,20 @@ c_common_has_attribute (cpp_reader *pfil
  tree attr_id
= get_identifier ((const char *)
  cpp_token_as_text (pfile, nxt_token));
- attr_name = build_tree_list (attr_ns, attr_id);
+ attr_id = canonicalize_attr_name (attr_id);
+ if (c_dialect_cxx ())
+   {
+ /* OpenMP attributes need special handling.  */
+ if ((flag_openmp || flag_openmp_simd)
+ && is_attribute_p ("omp", attr_ns)
+ && (is_attribute_p ("directive", attr_id)
+ || is_attribute_p ("sequence", attr_id)))
+   result = 1;
+   }
+ if (result)
+   attr_name = NULL_TREE;
+ else
+   attr_name = build_tree_list (attr_ns, attr_id);
}
  else
{
--- gcc/testsuite/c-c++-common/gomp/attrs-1.c.jj2021-07-22 
14:34:26.718586059 +0200
+++ gcc/testsuite/c-c++-common/gomp/attrs-1.c   2021-07-22 15:16:00.389926789 
+0200
@@ -0,0 +1,146 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+#if __has_attribute(omp::directive)
+#ifndef __cplusplus
+#error omp::directive supported in C
+#endif
+#else
+#ifdef __cplusplus
+#error omp::directive not supported in C++
+#endif
+#endif
+
+#if __has_attribute(omp::sequence)
+#ifndef __cplusplus
+#error omp::sequence supported in C
+#endif
+#else
+#ifdef __cplusplus
+#error omp::sequence not supported in C++
+#endif
+#endif
+
+#if __has_attribute(omp::unknown)
+#error omp::unknown supported
+#endif
+
+#if __has_cpp_attribute(omp::directive)
+#ifndef __cplusplus
+#error omp::directive supported in C
+#endif
+#else
+#ifdef __cplusplus
+#error omp::directive not supported in C++
+#endif
+#endif
+
+#if __has_cpp_attribute(omp::sequence)
+#ifndef __cplusplus
+#error omp::sequence supported in C
+#endif
+#else
+#ifdef __cplusplus
+#error omp::sequence not supported in C++
+#endif
+#endif
+
+#if __has_cpp_attribute(omp::unknown)
+#error omp::unknown supported
+#endif
+
+#if __has_attribute(__omp__::__directive__)
+#ifndef __cplusplus
+#error __omp__::__directive__ supported in C
+#endif
+#else
+#ifdef __cplusplus
+#error __omp__::__directive__ not supported in C++
+#endif
+#endif
+
+#if __has_attribute(__omp__::__sequence__)
+#ifndef __cplusplus
+#error __omp__::__sequence__ supported in C
+#endif
+#else
+#ifdef __cplusplus
+#error __omp__::__sequence__ not supported in C++
+#endif
+#endif
+
+#if __has_attribute(__omp__::__unknown__)
+#error __omp__::__unknown__ supported
+#endif
+
+#if __has_cpp_attribute(__omp__::__directive__)
+#ifndef __cplusplus
+#error __omp__::__directive__ supported in C
+#endif
+#else
+#ifdef __cplusplus
+#error __omp__::__directive__ not supported in C++
+#endif
+#endif
+
+#if __has_cpp_attribute(__omp__::__sequence__)
+#ifndef __cplusplus
+#error __omp__::__sequence__ supported in C
+#endif
+#else
+#ifdef __cplusplus
+#error __omp__::__sequence__ not supported in C++
+#endif
+#endif
+
+#if __has_cpp_attribute(__omp__::__unknown__)
+#error __omp__::__unknown__ supported
+#endif
+
+#if __has_attribute(omp::__directive__)
+#ifndef __cplusplus
+#error omp::__directive__ supported in C
+#endif
+#else
+#ifdef __cplusplus
+#error omp::__directive__ not supported in C++
+#endif
+#endif
+
+#if __has_attribute(__omp__::sequence)
+#ifndef __cplusplus
+#error __omp__::sequence supported in C
+#endif
+#else
+#ifdef __cplusplus
+#error __omp__::sequence not supported in C++
+#endif
+#endif
+
+#if __has_attribute(omp::__unknown__)
+#error omp::__unknown__ supported
+#endif
+
+#if __has_cpp_attribute(__omp__::directive)
+#ifndef __cplusplus
+#error __omp__::directive supported in C
+#endif
+#else
+#ifdef __cplusplus
+#error __omp__::directive not supported in C++
+#endif
+#endif
+
+#if __has_cpp_attribute(omp::__sequence__)
+#ifndef __cplusplus
+#error omp::__sequence__ supported in C
+#endif
+#else
+#ifdef __cplusplus
+#error omp::__sequence__ not supported in C++
+#endif
+#endif
+
+#if __has_cpp_attribute(__omp__::unknown)
+#error __omp__::unknown supported
+#endif
--- gcc/testsuite/c-c++-common/gomp/attrs-2.c.jj   

RE: [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.

2021-07-23 Thread Tamar Christina via Gcc-patches
Hi,

Sorry It looks like I forgot to ask if OK for backport to GCC 9, 10, 11 after 
some stew.

Thanks,
Tamar

> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, July 22, 2021 7:11 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 3/4]AArch64: correct dot-product RTL patterns for
> aarch64.
> 
> Tamar Christina  writes:
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd-builtins.def (sdot, udot): Rename to..
> > (sdot_prod, udot_prod): ... This.
> > * config/aarch64/aarch64-simd.md (aarch64_dot):
> Merged
> > into...
> > (dot_prod): ... this.
> > (aarch64_dot_lane, aarch64_dot_laneq):
> > Change operands order.
> > (sadv16qi): Use new operands order.
> > * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32,
> > vdotq_s32): Use new RTL ordering.
> 
> OK, thanks.
> 
> Richard


Re: [PATCH] PR fortrsn/101564 - ICE in resolve_allocate_deallocate, at fortran/resolve.c:8169

2021-07-23 Thread Tobias Burnus

Hi Harald,

On 22.07.21 21:50, Harald Anlauf wrote:

I am afraid we're really opening a can of worms here

which is not too bad if there are only two earthworms in there ;-)

Additionally, I wonder whether that will work with:

I think a "working" testcase for this could be:

program p
   implicit none
   integer, target  :: ptr
   integer, pointer :: A
   allocate (A, stat=f())
   print *, ptr
contains
   function f()
 integer, pointer :: f
 f => ptr
   end function f
end

Indeed that I meant.

This works as expected with Intel and AOCC, but gives a
syntax error with every gfortran tested because of match.c:

alloc_opt_list:
   m = gfc_match (" stat = %v", &tmp);


I think we can simply change that one to %e; the definable
check should ensure that any non variable (in the Fortran sense)
is rejected.

And we should update the comment for %v / match_variable to state
that it does not include function references.

In some cases, like with OpenMP, we still do not want to match
functions, hence, changing match_variable is probably not what we
want to do. Additionally, for all %v replaced by %e we need to
ensure that there is a definable check. (Which should be there
already as INTENT(IN) or named constants or ... are also invalid.)

Also affected: Some I/O items, a bunch of other stat=%v and
errmsg=%v.

Talking about errmsg: In the same function, the same check is
done for errmsg as for stat – hence, the patch should update
also errmsg.


Additionally, I have to admit that I do not understand the
following existing condition, which you did not touch:

if ((stat->ts.type != BT_INTEGER
 && !(stat->ref && (stat->ref->type == REF_ARRAY
|| stat->ref->type == REF_COMPONENT)))
|| stat->rank > 0)
  gfc_error ("Stat-variable at %L must be a scalar INTEGER "
 "variable", &stat->where);

I mean the ts.type != BT_INTEGER and stat->rank != 0 is clear,
but what's the reason for the refs?

Well, that needs to be answered by Steve (see commit 3759634).


(https://gcc.gnu.org/g:3759634f3208cbc1226bec19d22cbff989a287c3 (svn
r145331))

The reason for the ref checks is unclear and seem to be wrong. The added
testcases also only use 'x' (real) and n or i (integer) as input, i.e.
they do not exercise this. I did not look for the patch email for reasoning.

Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 1/8] aarch64: Use memcpy to copy vector tables in vqtbl[234] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vqtbl[234] intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-08  Jonathan Wright  

* config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vqtbl2_u8): Likewise.
(vqtbl2_p8): Likewise.
(vqtbl2q_s8): Likewise.
(vqtbl2q_u8): Likewise.
(vqtbl2q_p8): Likewise.
(vqtbl3_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_ci one vector at a time.
(vqtbl3_u8): Likewise.
(vqtbl3_p8): Likewise.
(vqtbl3q_s8): Likewise.
(vqtbl3q_u8): Likewise.
(vqtbl3q_p8): Likewise.
(vqtbl4_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_xi one vector at a time.
(vqtbl4_u8): Likewise.
(vqtbl4_p8): Likewise.
(vqtbl4q_s8): Likewise.
(vqtbl4q_u8): Likewise.
(vqtbl4q_p8): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: New test.


rb14639.patch
Description: rb14639.patch


[PATCH 2/8] aarch64: Use memcpy to copy vector tables in vqtbx[234] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vqtbx[234] Neon intrinsics in arm_neon.h. This simplifies the header
file and also improves code generation - superfluous move
instructions were emitted for every register extraction/set in this
additional structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vqtbx[234] intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-08  Jonathan Wright  

* config/aarch64/arm_neon.h (vqtbx2_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vqtbx2_u8): Likewise.
(vqtbx2_p8): Likewise.
(vqtbx2q_s8): Likewise.
(vqtbx2q_u8): Likewise.
(vqtbx2q_p8): Likewise.
(vqtbx3_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_ci one vector at a time.
(vqtbx3_u8): Likewise.
(vqtbx3_p8): Likewise.
(vqtbx3q_s8): Likewise.
(vqtbx3q_u8): Likewise.
(vqtbx3q_p8): Likewise.
(vqtbx4_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_xi one vector at a time.
(vqtbx4_u8): Likewise.
(vqtbx4_p8): Likewise.
(vqtbx4q_s8): Likewise.
(vqtbx4q_u8): Likewise.
(vqtbx4q_p8): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: New tests.

rb14640.patch
Description: rb14640.patch


[PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vtbl[34] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for every register extraction/set in this additional
structure.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-08  Jonathan Wright  

* config/aarch64/arm_neon.h (vtbl3_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vtbl3_u8): Likewise.
(vtbl3_p8): Likewise.
(vtbl4_s8): Likewise.
(vtbl4_u8): Likewise.
(vtbl4_p8): Likewise.

rb14673.patch
Description: rb14673.patch


[PATCH v3] Use range-based for loops for traversing loops

2021-07-23 Thread Kewen.Lin via Gcc-patches
Hi,

Comparing to v2, this v3 removed the new CTOR with struct loops *loops
as Richi clarified.  I'd like to support it in a separated follow up
patch by extending the existing CTOR with an optional argument loop_p
root.

Bootstrapped and regtested again on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, also
bootstrapped again on ppc64le P9 with bootstrap-O3 config.

Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* cfgloop.h (as_const): New function.
(class loop_iterator): Rename to ...
(class loops_list): ... this.
(loop_iterator::next): Rename to ...
(loops_list::Iter::fill_curr_loop): ... this and adjust.
(loop_iterator::loop_iterator): Rename to ...
(loops_list::loops_list): ... this and adjust.
(loops_list::Iter): New class.
(loops_list::iterator): New type.
(loops_list::const_iterator): New type.
(loops_list::begin): New function.
(loops_list::end): Likewise.
(loops_list::begin const): Likewise.
(loops_list::end const): Likewise.
(FOR_EACH_LOOP): Remove.
(FOR_EACH_LOOP_FN): Remove.
* cfgloop.c (flow_loops_dump): Adjust FOR_EACH_LOOP* with range-based
for loop with loops_list instance.
(sort_sibling_loops): Likewise.
(disambiguate_loops_with_multiple_latches): Likewise.
(verify_loop_structure): Likewise.
* cfgloopmanip.c (create_preheaders): Likewise.
(force_single_succ_latches): Likewise.
* config/aarch64/falkor-tag-collision-avoidance.c
(execute_tag_collision_avoidance): Likewise.
* config/mn10300/mn10300.c (mn10300_scan_for_setlb_lcc): Likewise.
* config/s390/s390.c (s390_adjust_loops): Likewise.
* doc/loop.texi: Likewise.
* gimple-loop-interchange.cc (pass_linterchange::execute): Likewise.
* gimple-loop-jam.c (tree_loop_unroll_and_jam): Likewise.
* gimple-loop-versioning.cc (loop_versioning::analyze_blocks): Likewise.
(loop_versioning::make_versioning_decisions): Likewise.
* gimple-ssa-split-paths.c (split_paths): Likewise.
* graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
* graphite.c (canonicalize_loop_form): Likewise.
(graphite_transform_loops): Likewise.
* ipa-fnsummary.c (analyze_function_body): Likewise.
* ipa-pure-const.c (analyze_function): Likewise.
* loop-doloop.c (doloop_optimize_loops): Likewise.
* loop-init.c (loop_optimizer_finalize): Likewise.
(fix_loop_structure): Likewise.
* loop-invariant.c (calculate_loop_reg_pressure): Likewise.
(move_loop_invariants): Likewise.
* loop-unroll.c (decide_unrolling): Likewise.
(unroll_loops): Likewise.
* modulo-sched.c (sms_schedule): Likewise.
* predict.c (predict_loops): Likewise.
(pass_profile::execute): Likewise.
* profile.c (branch_prob): Likewise.
* sel-sched-ir.c (sel_finish_pipelining): Likewise.
(sel_find_rgns): Likewise.
* tree-cfg.c (replace_loop_annotate): Likewise.
(replace_uses_by): Likewise.
(move_sese_region_to_fn): Likewise.
* tree-if-conv.c (pass_if_conversion::execute): Likewise.
* tree-loop-distribution.c (loop_distribution::execute): Likewise.
* tree-parloops.c (parallelize_loops): Likewise.
* tree-predcom.c (tree_predictive_commoning): Likewise.
* tree-scalar-evolution.c (scev_initialize): Likewise.
(scev_reset): Likewise.
* tree-ssa-dce.c (find_obviously_necessary_stmts): Likewise.
* tree-ssa-live.c (remove_unused_locals): Likewise.
* tree-ssa-loop-ch.c (ch_base::copy_headers): Likewise.
* tree-ssa-loop-im.c (analyze_memory_references): Likewise.
(tree_ssa_lim_initialize): Likewise.
* tree-ssa-loop-ivcanon.c (canonicalize_induction_variables): Likewise.
* tree-ssa-loop-ivopts.c (tree_ssa_iv_optimize): Likewise.
* tree-ssa-loop-manip.c (get_loops_exits): Likewise.
* tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Likewise.
(free_numbers_of_iterations_estimates): Likewise.
* tree-ssa-loop-prefetch.c (tree_ssa_prefetch_arrays): Likewise.
* tree-ssa-loop-split.c (tree_ssa_split_loops): Likewise.
* tree-ssa-loop-unswitch.c (tree_ssa_unswitch_loops): Likewise.
* tree-ssa-loop.c (gate_oacc_kernels): Likewise.
(pass_scev_cprop::execute): Likewise.
* tree-ssa-propagate.c (clean_up_loop_closed_phi): Likewise.
* tree-ssa-sccvn.c (do_rpo_vn): Likewise.
* tree-ssa-threadupdate.c
(jump_thread_path_registry::thread_through_all_blocks): Likewise.
* tree-vectorizer.c (vectorize_loops): Likewise.
* tree-vrp.c (vrp_asserts::find_assert_locations): Likewise.
---
 gcc/cfgloop.c |  19 +--
 gcc/cfgloop.h

[PATCH] Make loops_list support an optional loop_p root

2021-07-23 Thread Kewen.Lin via Gcc-patches
on 2021/7/22 下午8:56, Richard Biener wrote:
> On Tue, Jul 20, 2021 at 4:37
> PM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> This v2 has addressed some review comments/suggestions:
>>
>>   - Use "!=" instead of "<" in function operator!= (const Iter &rhs)
>>   - Add new CTOR loops_list (struct loops *loops, unsigned flags)
>> to support loop hierarchy tree rather than just a function,
>> and adjust to use loops* accordingly.
> 
> I actually meant struct loop *, not struct loops * ;)  At the point
> we pondered to make loop invariant motion work on single
> loop nests we gave up not only but also because it iterates
> over the loop nest but all the iterators only ever can process
> all loops, not say, all loops inside a specific 'loop' (and
> including that 'loop' if LI_INCLUDE_ROOT).  So the
> CTOR would take the 'root' of the loop tree as argument.
> 
> I see that doesn't trivially fit how loops_list works, at least
> not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
> could be adjusted to do ONLY_INNERMOST as well?
> 


Thanks for the clarification!  I just realized that the previous
version with struct loops* is problematic, all traversal is
still bounded with outer_loop == NULL.  I think what you expect
is to respect the given loop_p root boundary.  Since we just
record the loops' nums, I think we still need the function* fn?
So I add one optional argument loop_p root and update the
visiting codes accordingly.  Before this change, the previous
visiting uses the outer_loop == NULL as the termination condition,
it perfectly includes the root itself, but with this given root,
we have to use it as the termination condition to avoid to iterate
onto its possible existing next.

For LI_ONLY_INNERMOST, I was thinking whether we can use the
code like:

struct loops *fn_loops = loops_for_fn (fn)->larray;
for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++)
if (aloop != NULL
&& aloop->inner == NULL
&& flow_loop_nested_p (tree_root, aloop))
 this->to_visit.quick_push (aloop->num);

it has the stable bound, but if the given root only has several
child loops, it can be much worse if there are many loops in fn.
It seems impossible to predict the given root loop hierarchy size,
maybe we can still use the original linear searching for the case
loops_for_fn (fn) == root?  But since this visiting seems not so
performance critical, I chose to share the code originally used
for FROM_INNERMOST, hope it can have better readability and
maintainability.

Bootstrapped and regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, also
bootstrapped on ppc64le P9 with bootstrap-O3 config.

Does the attached patch meet what you expect?

BR,
Kewen
-
gcc/ChangeLog:

* cfgloop.h (loops_list::loops_list): Add one optional argument root
and adjust accordingly.
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 741df44ea51..f7148df1758 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -669,13 +669,15 @@ as_const (T &t)
 }
 
 /* A list for visiting loops, which contains the loop numbers instead of
-   the loop pointers.  The scope is restricted in function FN and the
-   visiting order is specified by FLAGS.  */
+   the loop pointers.  If the loop ROOT is offered (non-null), the visiting
+   will start from it, otherwise it would start from loops_for_fn (FN)
+   instead.  The scope is restricted in function FN and the visiting order
+   is specified by FLAGS.  */
 
 class loops_list
 {
 public:
-  loops_list (function *fn, unsigned flags);
+  loops_list (function *fn, unsigned flags, loop_p root = nullptr);
 
   template  class Iter
   {
@@ -782,71 +784,94 @@ loops_list::Iter::fill_curr_loop ()
 }
 
 /* Set up the loops list to visit according to the specified
-   function scope FN and iterating order FLAGS.  */
+   function scope FN and iterating order FLAGS.  If ROOT is
+   not null, the visiting would start from it, otherwise it
+   will start from tree_root of loops_for_fn (FN).  */
 
-inline loops_list::loops_list (function *fn, unsigned flags)
+inline loops_list::loops_list (function *fn, unsigned flags, loop_p root)
 {
   class loop *aloop;
-  unsigned i;
   int mn;
 
+  struct loops *loops = loops_for_fn (fn);
+  gcc_assert (!root || loops);
+
   this->fn = fn;
-  if (!loops_for_fn (fn))
+  if (!loops)
 return;
 
+  loop_p tree_root = root ? root : loops->tree_root;
+
   this->to_visit.reserve_exact (number_of_loops (fn));
-  mn = (flags & LI_INCLUDE_ROOT) ? 0 : 1;
+  mn = (flags & LI_INCLUDE_ROOT) ? -1 : tree_root->num;
 
-  if (flags & LI_ONLY_INNERMOST)
-{
-  for (i = 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, &aloop); i++)
-   if (aloop != NULL
-   && aloop->inner == NULL
-   && aloop->num >= mn)
+  /* The helper function for LI_FROM_INNERMOST and LI_ONLY_INNERMOST
+ visiting, ONLY_PUSH_INNERMOST_P indicates whether only push
+ the innermost loop, it's true for LI_ONLY_INNERMOST vis

[PATCH v4] c++: Add gnu::diagnose_as attribute

2021-07-23 Thread Matthias Kretz
Hi Jason,

I found a few regressions from the last patch in the meantime. Version 4 of 
the patch is attached.

Questions:

1. I simplified the condition for calling dump_template_parms in 
dump_function_name. !DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION (t) is 
equivalent to DECL_USE_TEMPLATE (t) in this context; implying that 
dump_template_parms is unconditionally called with `primary = false`. Or am I 
missing something?

2. Given a DECL_TI_ARGS tree, can I query whether an argument was deduced or 
explicitly specified? I'm asking because I still consider diagnostics of 
function templates unfortunate. `template  void f()` is fine, as is 
`void f(T) [with T = float]`, but `void f() [with T = float]` could be better. 
I.e. if the template parameter appears somewhere in the function parameter 
list, dump_template_parms would only produce noise. If, however, the template 
parameter was given explicitly, it would be nice if it could show up 
accordingly in diagnostics.

3. When parsing tentatively and the parse is rejected, input_location is not 
reset, correct? In the attached patch I therefore made 
cp_parser_namespace_alias_definition reset input_location on a failed 
tentative parse. But it feels wrong. Shouldn't input_location be restored on 
cp_parser_parse_definitely?

--

This attribute overrides the diagnostics output string for the entity it
appertains to. The motivation is to improve QoI for library TS
implementations, where diagnostics have a very bad signal-to-noise ratio
due to the long namespaces involved.

With the attribute, it is possible to solve PR89370 and make
std::__cxx11::basic_string<_CharT, _Traits, _Alloc> appear as
std::string in diagnostic output without extra hacks to recognize the
type in the C++ frontend.

Signed-off-by: Matthias Kretz 

gcc/ChangeLog:

PR c++/89370
* doc/extend.texi: Document the diagnose_as attribute.
* doc/invoke.texi: Document -fno-diagnostics-use-aliases.

gcc/c-family/ChangeLog:

PR c++/89370
* c.opt (fdiagnostics-use-aliases): New diagnostics flag.

gcc/cp/ChangeLog:

PR c++/89370
* cp-tree.h: Add is_alias_template_p declaration.
* decl2.c (is_alias_template_p): New function. Determines
whether a given TYPE_DECL is actually an alias template that is
still missing its template_info.
(is_late_template_attribute): Decls with diagnose_as attribute
are early attributes only if they are alias templates.
* error.c (dump_scope): When printing the name of a namespace,
look for the diagnose_as attribute. If found, print the
associated string instead of calling dump_decl.
(dump_decl_name_or_diagnose_as): New function to replace
dump_decl (pp, DECL_NAME(t), flags) and inspect the tree for the
diagnose_as attribute before printing the DECL_NAME.
(dump_template_scope): New function. Prints the scope of a
template instance correctly applying diagnose_as attributes and
adjusting the list of template parms accordingly.
(dump_aggr_type): If the type has a diagnose_as attribute, print
the associated string instead of printing the original type
name. Print template parms only if the attribute was not applied
to the instantiation / full specialization. Delay call to
dump_scope until the diagnose_as attribute is found. If the
attribute has a second argument, use it to override the context
passed to dump_scope.
(dump_simple_decl): Call dump_decl_name_or_diagnose_as instead
of dump_decl.
(dump_decl): Ditto.
(lang_decl_name): Ditto.
(dump_function_decl): Walk the functions context list to
determine whether a call to dump_template_scope is required.
Ensure function templates diagnosed with pretty templates set
TFF_TEMPLATE_NAME to skip dump_template_parms.
(dump_function_name): Replace the function's identifier with the
diagnose_as attribute value, if set. Expand
DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION to DECL_USE_TEMPLATE
and consequently call dump_template_parms with primary = false.
(comparable_template_types_p): Consider the types not a template
if one carries a diagnose_as attribute.
(print_template_differences): Replace the identifier with the
diagnose_as attribute value on the most general template, if it
is set.
* name-lookup.c (handle_namespace_attrs): Handle the diagnose_as
attribute on namespaces. Ensure exactly one string argument.
Ensure previous diagnose_as attributes used the same name.
'diagnose_as' on namespace aliases are forwarded to the original
namespace. Support no-argument 'diagnose_as' on namespace
aliases.
(do_namespace_alias): Add attributes parameter and call
handle_namespace_attrs.
* name-lookup.h (do_namespace_alias): Add at

[PATCH 4/8] aarch64: Use memcpy to copy vector tables in vtbx4 intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vtbx4 Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for every register extraction/set in this additional
structure.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-19  Jonathan Wright  

* config/aarch64/arm_neon.h (vtbx4_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vtbx4_u8): Likewise.
(vtbx4_p8): Likewise.


rb14674.patch
Description: rb14674.patch


RE: [PATCH 1/8] aarch64: Use memcpy to copy vector tables in vqtbl[234] intrinsics

2021-07-23 Thread Kyrylo Tkachov via Gcc-patches
Hi Jonathan,

> -Original Message-
> From: Jonathan Wright 
> Sent: 23 July 2021 09:22
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH 1/8] aarch64: Use memcpy to copy vector tables in
> vqtbl[234] intrinsics
> 
> Hi,
> 
> This patch uses __builtin_memcpy to copy vector structures instead of
> building a new opaque structure one vector at a time in each of the
> vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file
> and also improves code generation - superfluous move instructions
> were emitted for every register extraction/set in this additional
> structure.
> 
> Add new code generation tests to verify that superfluous move
> instructions are no longer generated for the vqtbl[234] intrinsics.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?
> 

In the testcase:
diff --git a/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c 
b/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c
new file mode 100644
index 
..2fab0f2947b7fa28e4e3a77bd365dcfdf30a9b28
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c
@@ -0,0 +1,45 @@
+/* { dg-skip-if "" { arm*-*-* } } */

Files in gcc.target/aarch64 won't be attempted on arm* targets so the skip-if 
isn't needed (that's only for tests in gcc.target/aarch64/advsimd-intrinsics/).

Ok with that directive removed, thanks for doing this!
Kyrill


> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-07-08  Jonathan Wright  
> 
>   * config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy
>   instead of constructing __builtin_aarch64_simd_oi one vector
>   at a time.
>   (vqtbl2_u8): Likewise.
>   (vqtbl2_p8): Likewise.
>   (vqtbl2q_s8): Likewise.
>   (vqtbl2q_u8): Likewise.
>   (vqtbl2q_p8): Likewise.
>   (vqtbl3_s8): Use __builtin_memcpy instead of constructing
>   __builtin_aarch64_simd_ci one vector at a time.
>   (vqtbl3_u8): Likewise.
>   (vqtbl3_p8): Likewise.
>   (vqtbl3q_s8): Likewise.
>   (vqtbl3q_u8): Likewise.
>   (vqtbl3q_p8): Likewise.
>   (vqtbl4_s8): Use __builtin_memcpy instead of constructing
>   __builtin_aarch64_simd_xi one vector at a time.
>   (vqtbl4_u8): Likewise.
>   (vqtbl4_p8): Likewise.
>   (vqtbl4q_s8): Likewise.
>   (vqtbl4q_u8): Likewise.
>   (vqtbl4q_p8): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/vector_structure_intrinsics.c: New test.



Re: [PATCH v2] gcov: Add __gcov_info_to_gdca()

2021-07-23 Thread Sebastian Huber

On 23/07/2021 08:52, Martin Liška wrote:



It would be nice having a test-case that can test your approach.


The problem is that you need the linker set to get access to the gcov 
information. The test program of the commit message works on my Linux 
machine. I am not sure if it is generic enough for the test suite. 
Instead of printing the information we could compare it against an 
expected output so that we have a self-contained test program.


Yep, that would be nice.


I tried to run the attached test case as 
"gcc/testsuite/gcc.dg/gcov-info-to-gcda.c". However, I get this error:


Invoking the compiler as /tmp/sh/b-gcc-git-linux/gcc/xgcc 
-B/tmp/sh/b-gcc-git-linux/gcc/ 
/home/EB/sebastian_h/src/gcc/gcc/testsuite/gcc.dg/gcov-info-to-gcda.c 
 -fdiagnostics-plain-output   -fprofile-arcs -fprofile-info-section 
-lm  -o ./gcov-info-to-gcda.exe

Setting timeout to 300
Executing on host: /tmp/sh/b-gcc-git-linux/gcc/xgcc 
-B/tmp/sh/b-gcc-git-linux/gcc/ 
/home/EB/sebastian_h/src/gcc/gcc/testsuite/gcc.dg/gcov-info-to-gcda.c 
 -fdiagnostics-plain-output   -fprofile-arcs -fprofile-info-section 
-lm  -o ./gcov-info-to-gcda.exe(timeout = 300)
spawn -ignore SIGHUP /tmp/sh/b-gcc-git-linux/gcc/xgcc 
-B/tmp/sh/b-gcc-git-linux/gcc/ 
/home/EB/sebastian_h/src/gcc/gcc/testsuite/gcc.dg/gcov-info-to-gcda.c 
-fdiagnostics-plain-output -fprofile-arcs -fprofile-info-section -lm -o 
./gcov-info-to-gcda.exe


pid is 79704 -79704
/home/EB/sebastian_h/src/gcc/gcc/testsuite/gcc.dg/gcov-info-to-gcda.c:5:10: 
fatal error: gcov.h: No such file or directory


compilation terminated.


Is it possible to do such kind of tests? I also have to link to -lgcov?

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/
/* { dg-do run } */
/* { dg-skip-if "profile-info-section" { powerpc-ibm-aix* } } */
/* { dg-options "-fprofile-arcs -fprofile-info-section" } */

#include 

extern const struct gcov_info *my_info;

static unsigned counter;

static void
filename (const char *f, void *arg)
{
  if (arg != &counter)
__builtin_abort ();

  if (__builtin_strcmp (f, __FILE__) != 0)
__builtin_abort ();
}

static void
dump (const void *d, unsigned n, void *arg)
{
  unsigned *m = (unsigned *)arg;
  if (m != &counter)
__builtin_abort ();

  if (*m == 0)
  {
const unsigned *u = d;
if (*u != 0x67636461)
  __builtin_abort ();
  }

  *m += n;
}

static void *
allocate (unsigned length, void *arg)
{
  if (arg != &counter)
__builtin_abort ();

  return __builtin_malloc (length);
}

int main()
{
  __asm__ volatile (".set my_info, .LPBX2");
  __gcov_info_to_gcda (my_info, filename, dump, allocate, &counter);
  return 0;
}


[PATCH 5/8] aarch64: Use memcpy to copy vector tables in vst4[q] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst4[q] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst4q intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-20  Jonathan Wright  

* config/aarch64/arm_neon.h (vst4_s64): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_xi one vector
at a time.
(vst4_u64): Likewise.
(vst4_f64): Likewise.
(vst4_s8): Likewise.
(vst4_p8): Likewise.
(vst4_s16): Likewise.
(vst4_p16): Likewise.
(vst4_s32): Likewise.
(vst4_u8): Likewise.
(vst4_u16): Likewise.
(vst4_u32): Likewise.
(vst4_f16): Likewise.
(vst4_f32): Likewise.
(vst4_p64): Likewise.
(vst4q_s8): Likewise.
(vst4q_p8): Likewise.
(vst4q_s16): Likewise.
(vst4q_p16): Likewise.
(vst4q_s32): Likewise.
(vst4q_s64): Likewise.
(vst4q_u8): Likewise.
(vst4q_u16): Likewise.
(vst4q_u32): Likewise.
(vst4q_u64): Likewise.
(vst4q_f16): Likewise.
(vst4q_f32): Likewise.
(vst4q_f64): Likewise.
(vst4q_p64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.


rb14687.patch
Description: rb14687.patch


[PATCH 6/8] aarch64: Use memcpy to copy vector tables in vst3[q] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst3[q] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst3q intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-21  Jonathan Wright  

* config/aarch64/arm_neon.h (vst3_s64): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_ci one vector
at a time.
(vst3_u64): Likewise.
(vst3_f64): Likewise.
(vst3_s8): Likewise.
(vst3_p8): Likewise.
(vst3_s16): Likewise.
(vst3_p16): Likewise.
(vst3_s32): Likewise.
(vst3_u8): Likewise.
(vst3_u16): Likewise.
(vst3_u32): Likewise.
(vst3_f16): Likewise.
(vst3_f32): Likewise.
(vst3_p64): Likewise.
(vst3q_s8): Likewise.
(vst3q_p8): Likewise.
(vst3q_s16): Likewise.
(vst3q_p16): Likewise.
(vst3q_s32): Likewise.
(vst3q_s64): Likewise.
(vst3q_u8): Likewise.
(vst3q_u16): Likewise.
(vst3q_u32): Likewise.
(vst3q_u64): Likewise.
(vst3q_f16): Likewise.
(vst3q_f32): Likewise.
(vst3q_f64): Likewise.
(vst3q_p64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.


rb14688.patch
Description: rb14688.patch


RE: [PATCH 2/8] aarch64: Use memcpy to copy vector tables in vqtbx[234] intrinsics

2021-07-23 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 23 July 2021 09:27
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH 2/8] aarch64: Use memcpy to copy vector tables in
> vqtbx[234] intrinsics
> 
> Hi,
> 
> This patch uses __builtin_memcpy to copy vector structures instead of
> building a new opaque structure one vector at a time in each of the
> vqtbx[234] Neon intrinsics in arm_neon.h. This simplifies the header
> file and also improves code generation - superfluous move
> instructions were emitted for every register extraction/set in this
> additional structure.
> 
> Add new code generation tests to verify that superfluous move
> instructions are no longer generated for the vqtbx[234] intrinsics.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?
> 

Ok.
Thanks,
Kyrill

> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-07-08  Jonathan Wright  
> 
>   * config/aarch64/arm_neon.h (vqtbx2_s8): Use __builtin_memcpy
>   instead of constructing __builtin_aarch64_simd_oi one vector
>   at a time.
>   (vqtbx2_u8): Likewise.
>   (vqtbx2_p8): Likewise.
>   (vqtbx2q_s8): Likewise.
>   (vqtbx2q_u8): Likewise.
>   (vqtbx2q_p8): Likewise.
>   (vqtbx3_s8): Use __builtin_memcpy instead of constructing
>   __builtin_aarch64_simd_ci one vector at a time.
>   (vqtbx3_u8): Likewise.
>   (vqtbx3_p8): Likewise.
>   (vqtbx3q_s8): Likewise.
>   (vqtbx3q_u8): Likewise.
>   (vqtbx3q_p8): Likewise.
>   (vqtbx4_s8): Use __builtin_memcpy instead of constructing
>   __builtin_aarch64_simd_xi one vector at a time.
>   (vqtbx4_u8): Likewise.
>   (vqtbx4_p8): Likewise.
>   (vqtbx4q_s8): Likewise.
>   (vqtbx4q_u8): Likewise.
>   (vqtbx4q_p8): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/vector_structure_intrinsics.c: New tests.


RE: [PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34] intrinsics

2021-07-23 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 23 July 2021 09:30
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34]
> intrinsics
> 
> Hi,
> 
> This patch uses __builtin_memcpy to copy vector structures instead of
> building a new opaque structure one vector at a time in each of the
> vtbl[34] Neon intrinsics in arm_neon.h. This simplifies the header file
> and also improves code generation - superfluous move instructions
> were emitted for every register extraction/set in this additional
> structure.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-07-08  Jonathan Wright  
> 
>   * config/aarch64/arm_neon.h (vtbl3_s8): Use __builtin_memcpy
>   instead of constructing __builtin_aarch64_simd_oi one vector
>   at a time.
>   (vtbl3_u8): Likewise.
>   (vtbl3_p8): Likewise.
>   (vtbl4_s8): Likewise.
>   (vtbl4_u8): Likewise.
>   (vtbl4_p8): Likewise.


RE: [PATCH 4/8] aarch64: Use memcpy to copy vector tables in vtbx4 intrinsics

2021-07-23 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 23 July 2021 10:15
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH 4/8] aarch64: Use memcpy to copy vector tables in vtbx4
> intrinsics
> 
> Hi,
> 
> This patch uses __builtin_memcpy to copy vector structures instead of
> building a new opaque structure one vector at a time in each of the
> vtbx4 Neon intrinsics in arm_neon.h. This simplifies the header file
> and also improves code generation - superfluous move instructions
> were emitted for every register extraction/set in this additional
> structure.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-07-19  Jonathan Wright  
> 
>   * config/aarch64/arm_neon.h (vtbx4_s8): Use __builtin_memcpy
>   instead of constructing __builtin_aarch64_simd_oi one vector
>   at a time.
>   (vtbx4_u8): Likewise.
>   (vtbx4_p8): Likewise.



Re: [PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34] intrinsics

2021-07-23 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov  writes:
>> -Original Message-
>> From: Jonathan Wright 
>> Sent: 23 July 2021 09:30
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov ; Richard Sandiford
>> 
>> Subject: [PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34]
>> intrinsics
>>
>> Hi,
>>
>> This patch uses __builtin_memcpy to copy vector structures instead of
>> building a new opaque structure one vector at a time in each of the
>> vtbl[34] Neon intrinsics in arm_neon.h. This simplifies the header file
>> and also improves code generation - superfluous move instructions
>> were emitted for every register extraction/set in this additional
>> structure.
>>
>> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
>> issues.
>>
>> Ok for master?
>
> Ok.

Please add testcases first though. :-)

Thanks,
Richard


RE: [PATCH 5/8] aarch64: Use memcpy to copy vector tables in vst4[q] intrinsics

2021-07-23 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 23 July 2021 10:22
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH 5/8] aarch64: Use memcpy to copy vector tables in vst4[q]
> intrinsics
> 
> Hi,
> 
> This patch uses __builtin_memcpy to copy vector structures instead of
> building a new opaque structure one vector at a time in each of the
> vst4[q] Neon intrinsics in arm_neon.h. This simplifies the header file
> and also improves code generation - superfluous move instructions
> were emitted for every register extraction/set in this additional
> structure.
> 
> Add new code generation tests to verify that superfluous move
> instructions are no longer generated for the vst4q intrinsics.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-07-20  Jonathan Wright  
> 
>   * config/aarch64/arm_neon.h (vst4_s64): Use __builtin_memcpy
>   instead of constructing __builtin_aarch64_simd_xi one vector
>   at a time.
>   (vst4_u64): Likewise.
>   (vst4_f64): Likewise.
>   (vst4_s8): Likewise.
>   (vst4_p8): Likewise.
>   (vst4_s16): Likewise.
>   (vst4_p16): Likewise.
>   (vst4_s32): Likewise.
>   (vst4_u8): Likewise.
>   (vst4_u16): Likewise.
>   (vst4_u32): Likewise.
>   (vst4_f16): Likewise.
>   (vst4_f32): Likewise.
>   (vst4_p64): Likewise.
>   (vst4q_s8): Likewise.
>   (vst4q_p8): Likewise.
>   (vst4q_s16): Likewise.
>   (vst4q_p16): Likewise.
>   (vst4q_s32): Likewise.
>   (vst4q_s64): Likewise.
>   (vst4q_u8): Likewise.
>   (vst4q_u16): Likewise.
>   (vst4q_u32): Likewise.
>   (vst4q_u64): Likewise.
>   (vst4q_f16): Likewise.
>   (vst4q_f32): Likewise.
>   (vst4q_f64): Likewise.
>   (vst4q_p64): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
>   tests.



Re: [PATCH 4/8] aarch64: Use memcpy to copy vector tables in vtbx4 intrinsics

2021-07-23 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov  writes:
>> -Original Message-
>> From: Jonathan Wright 
>> Sent: 23 July 2021 10:15
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov ; Richard Sandiford
>> 
>> Subject: [PATCH 4/8] aarch64: Use memcpy to copy vector tables in vtbx4
>> intrinsics
>>
>> Hi,
>>
>> This patch uses __builtin_memcpy to copy vector structures instead of
>> building a new opaque structure one vector at a time in each of the
>> vtbx4 Neon intrinsics in arm_neon.h. This simplifies the header file
>> and also improves code generation - superfluous move instructions
>> were emitted for every register extraction/set in this additional
>> structure.
>>
>> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
>> issues.
>>
>> Ok for master?
>
> Ok.

Here too I think we want some testcases…

Thanks,
Richard


Re: [PATCH v2] gcov: Add __gcov_info_to_gdca()

2021-07-23 Thread Sebastian Huber




On 23/07/2021 11:17, Sebastian Huber wrote:

On 23/07/2021 08:52, Martin Liška wrote:



It would be nice having a test-case that can test your approach.


The problem is that you need the linker set to get access to the gcov 
information. The test program of the commit message works on my Linux 
machine. I am not sure if it is generic enough for the test suite. 
Instead of printing the information we could compare it against an 
expected output so that we have a self-contained test program.


Yep, that would be nice.


I tried to run the attached test case as 
"gcc/testsuite/gcc.dg/gcov-info-to-gcda.c". However, I get this error:


Invoking the compiler as /tmp/sh/b-gcc-git-linux/gcc/xgcc 
-B/tmp/sh/b-gcc-git-linux/gcc/ 
/home/EB/sebastian_h/src/gcc/gcc/testsuite/gcc.dg/gcov-info-to-gcda.c 
  -fdiagnostics-plain-output   -fprofile-arcs -fprofile-info-section 
-lm  -o ./gcov-info-to-gcda.exe

Setting timeout to 300
Executing on host: /tmp/sh/b-gcc-git-linux/gcc/xgcc 
-B/tmp/sh/b-gcc-git-linux/gcc/ 
/home/EB/sebastian_h/src/gcc/gcc/testsuite/gcc.dg/gcov-info-to-gcda.c 
  -fdiagnostics-plain-output   -fprofile-arcs -fprofile-info-section 
-lm  -o ./gcov-info-to-gcda.exe    (timeout = 300)
spawn -ignore SIGHUP /tmp/sh/b-gcc-git-linux/gcc/xgcc 
-B/tmp/sh/b-gcc-git-linux/gcc/ 
/home/EB/sebastian_h/src/gcc/gcc/testsuite/gcc.dg/gcov-info-to-gcda.c 
-fdiagnostics-plain-output -fprofile-arcs -fprofile-info-section -lm -o 
./gcov-info-to-gcda.exe


pid is 79704 -79704
/home/EB/sebastian_h/src/gcc/gcc/testsuite/gcc.dg/gcov-info-to-gcda.c:5:10: 
fatal error: gcov.h: No such file or directory


compilation terminated.


Is it possible to do such kind of tests? I also have to link to -lgcov?


Ok, the linking is not the problem. If I declare __gcov_info_to_gcda() 
locally, the test runs.


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [ARM] PR66791: Replace builtins in vshl_n

2021-07-23 Thread Richard Earnshaw via Gcc-patches
On 23/07/2021 08:04, Prathamesh Kulkarni via Gcc-patches wrote:
> On Thu, 22 Jul 2021 at 20:29, Richard Earnshaw
>  wrote:
>>
>>
>>
>> On 22/07/2021 14:47, Prathamesh Kulkarni via Gcc-patches wrote:
>>> On Thu, 22 Jul 2021 at 17:28, Richard Earnshaw
>>>  wrote:



 On 22/07/2021 12:32, Prathamesh Kulkarni wrote:
> On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
>  wrote:
>>
>>
>>
>> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
>>> Hi,
>>> The attached patch removes calls to builtins from vshl_n intrinsics,
>>> and replacing them
>>> with left shift operator. The patch passes bootstrap+test on
>>> arm-linux-gnueabihf.
>>>
>>> Altho, I noticed, that the patch causes 3 extra registers to spill
>>> using << instead
>>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of
>>> intrinsics ?
>>> Before patch, the shift operation was performed by call to
>>> __builtin_neon_vshl (__a, __b)
>>> and now it's inlined to __a << __b, which might result in increased
>>> register pressure ?
>>>
>>> Thanks,
>>> Prathamesh
>>>
>>
>>
>> You're missing a ChangeLog for the patch.
> Sorry, updated in this patch.
>>
>> However, I'm not sure about this.  The register shift form of VSHL
>> performs a right shift if the value is negative, which is UB if you
>> write `<<` instead.
>>
>> Have I missed something here?
> Hi Richard,
> According to this article:
> https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
> For vshl_n, the shift amount is always in the non-negative range for all 
> types.
>
> I tried using vshl_n_s32 (a, -1), and the compiler emitted following 
> diagnostic:
> foo.c: In function ‘main’:
> foo.c:17:1: error: constant -1 out of range 0 - 31
>  17 | }
> | ^
>

 It does do that now, but that's because the intrinsic expansion does
 some bounds checking; when you remove the call into the back-end
 intrinsic that will no-longer happen.

 I think with this change various things are likely:

 - We'll no-longer reject non-immediate values, so users will be able to
 write

   int b = 5;
  vshl_n_s32 (a, b);

 which will expand to a vdup followed by the register form.

 - we'll rely on the front-end diagnosing out-of range shifts

 - code of the form

  int b = -1;
  vshl_n_s32 (a, b);

 will probably now go through without any errors, especially at low
 optimization levels.  It may end up doing what the user wanted, but it's
 definitely a change in behaviour - and perhaps worse, the compiler might
 diagnose the above as UB and silently throw some stuff away.

 It might be that we need to insert some form of static assertion that
 the second argument is a __builtin_constant_p().
>>> Ah right, thanks for the suggestions!
>>> I tried the above example:
>>> int b = -1;
>>> vshl_n_s32 (a, b);
>>> and it compiled without any errors with -O0 after patch.
>>>
>>> Would it be OK to use _Static_assert (__builtin_constant_p (b)) to
>>> guard against non-immediate values ?
>>>
>>> With the following change:
>>> __extension__ extern __inline int32x2_t
>>> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
>>> vshl_n_s32 (int32x2_t __a, const int __b)
>>> {
>>>_Static_assert (__builtin_constant_p (__b));
>>>return __a << __b;
>>> }
>>>
>>> the above example fails at -O0:
>>> ../armhf-build/gcc/include/arm_neon.h: In function ‘vshl_n_s32’:
>>> ../armhf-build/gcc/include/arm_neon.h:4904:3: error: static assertion failed
>>>   4904 |   _Static_assert (__builtin_constant_p (__b));
>>>|   ^~
>>
>> I've been playing with that but unfortunately it doesn't seem to work in
>> the way we want it to.  For a complete test:
>>
>>
>>
>> typedef __simd64_int32_t int32x2_t;
>>
>> __extension__ extern __inline int32x2_t
>> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
>> vshl_n_s32 (int32x2_t __a, const int __b)
>> {
>>_Static_assert (__builtin_constant_p (__b), "Second argument must be
>> a litteral constant");
>>return __a << __b;
>> }
>>
>> int32x2_t f (int32x2_t x, const int b)
>> {
>>return vshl_n_s32 (x, 1);
>> }
>>
>> At -O0 I get:
>>
>> test.c: In function ‘vshl_n_s32’:
>> test.c:7:3: error: static assertion failed: "Second argument must be a
>> litteral constant"
>>  7 |   _Static_assert (__builtin_constant_p (__b), "Second argument
>> must be a litteral constant");
>>|   ^~
>>
>> While at -O1 and above I get:
>>
>>
>> test.c: In function ‘vshl_n_s32’:
>> test.c:7:19: error: expression in static assertion is not constant
>>  7 |   _Static_assert (__builtin_constant_p (__b), "Second argument
>> must be

Re: [PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
I haven't added test cases here because these intrinsics don't map to
a single instruction (they're legacy from Armv7) and would trip the
"scan-assembler not mov" that we're using for the other tests.

Jonathan

From: Richard Sandiford 
Sent: 23 July 2021 10:29
To: Kyrylo Tkachov 
Cc: Jonathan Wright ; gcc-patches@gcc.gnu.org 

Subject: Re: [PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34] 
intrinsics

Kyrylo Tkachov  writes:
>> -Original Message-
>> From: Jonathan Wright 
>> Sent: 23 July 2021 09:30
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov ; Richard Sandiford
>> 
>> Subject: [PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34]
>> intrinsics
>>
>> Hi,
>>
>> This patch uses __builtin_memcpy to copy vector structures instead of
>> building a new opaque structure one vector at a time in each of the
>> vtbl[34] Neon intrinsics in arm_neon.h. This simplifies the header file
>> and also improves code generation - superfluous move instructions
>> were emitted for every register extraction/set in this additional
>> structure.
>>
>> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
>> issues.
>>
>> Ok for master?
>
> Ok.

Please add testcases first though. :-)

Thanks,
Richard


RE: [PATCH 6/8] aarch64: Use memcpy to copy vector tables in vst3[q] intrinsics

2021-07-23 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 23 July 2021 10:25
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH 6/8] aarch64: Use memcpy to copy vector tables in vst3[q]
> intrinsics
> 
> Hi,
> 
> This patch uses __builtin_memcpy to copy vector structures instead of
> building a new opaque structure one vector at a time in each of the
> vst3[q] Neon intrinsics in arm_neon.h. This simplifies the header file
> and also improves code generation - superfluous move instructions
> were emitted for every register extraction/set in this additional
> structure.
> 
> Add new code generation tests to verify that superfluous move
> instructions are no longer generated for the vst3q intrinsics.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-07-21  Jonathan Wright  
> 
>   * config/aarch64/arm_neon.h (vst3_s64): Use __builtin_memcpy
>   instead of constructing __builtin_aarch64_simd_ci one vector
>   at a time.
>   (vst3_u64): Likewise.
>   (vst3_f64): Likewise.
>   (vst3_s8): Likewise.
>   (vst3_p8): Likewise.
>   (vst3_s16): Likewise.
>   (vst3_p16): Likewise.
>   (vst3_s32): Likewise.
>   (vst3_u8): Likewise.
>   (vst3_u16): Likewise.
>   (vst3_u32): Likewise.
>   (vst3_f16): Likewise.
>   (vst3_f32): Likewise.
>   (vst3_p64): Likewise.
>   (vst3q_s8): Likewise.
>   (vst3q_p8): Likewise.
>   (vst3q_s16): Likewise.
>   (vst3q_p16): Likewise.
>   (vst3q_s32): Likewise.
>   (vst3q_s64): Likewise.
>   (vst3q_u8): Likewise.
>   (vst3q_u16): Likewise.
>   (vst3q_u32): Likewise.
>   (vst3q_u64): Likewise.
>   (vst3q_f16): Likewise.
>   (vst3q_f32): Likewise.
>   (vst3q_f64): Likewise.
>   (vst3q_p64): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
>   tests.



[PATCH 7/8] aarch64: Use memcpy to copy vector tables in vst2[q] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst2[q] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst2q intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-21  Jonathan Wrightt  

* config/aarch64/arm_neon.h (vst2_s64): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vst2_u64): Likewise.
(vst2_f64): Likewise.
(vst2_s8): Likewise.
(vst2_p8): Likewise.
(vst2_s16): Likewise.
(vst2_p16): Likewise.
(vst2_s32): Likewise.
(vst2_u8): Likewise.
(vst2_u16): Likewise.
(vst2_u32): Likewise.
(vst2_f16): Likewise.
(vst2_f32): Likewise.
(vst2_p64): Likewise.
(vst2q_s8): Likewise.
(vst2q_p8): Likewise.
(vst2q_s16): Likewise.
(vst2q_p16): Likewise.
(vst2q_s32): Likewise.
(vst2q_s64): Likewise.
(vst2q_u8): Likewise.
(vst2q_u16): Likewise.
(vst2q_u32): Likewise.
(vst2q_u64): Likewise.
(vst2q_f16): Likewise.
(vst2q_f32): Likewise.
(vst2q_f64): Likewise.
(vst2q_p64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.


rb14689.patch
Description: rb14689.patch


[PATCH v3] gcov: Add __gcov_info_to_gdca()

2021-07-23 Thread Sebastian Huber
Add __gcov_info_to_gcda() to libgcov to get the gcda data for a gcda info in a
freestanding environment.  It is intended to be used with the
-fprofile-info-section option.  A crude test program which doesn't use a linker
script is (use "gcc -coverage -fprofile-info-section -lgcc test.c" to compile
it):

  #include 
  #include 
  #include 

  extern const struct gcov_info *my_info;

  static void
  filename (const char *f, void *arg)
  {
printf("filename: %s\n", f);
  }

  static void
  dump (const void *d, unsigned n, void *arg)
  {
const unsigned char *c = d;

for (unsigned i = 0; i < n; ++i)
  printf ("%02x", c[i]);
  }

  static void *
  allocate (unsigned length, void *arg)
  {
return malloc (length);
  }

  int main()
  {
__asm__ volatile (".set my_info, .LPBX2");
__gcov_info_to_gcda (my_info, filename, dump, allocate, NULL);
return 0;
  }

With this patch,  is included in libgcov-driver.c even if
inhibit_libc is defined.  This header file should be also available for
freestanding environments.  If this is not the case, then we have to define
intptr_t somehow.

The patch removes one use of memset() which makes the  include
superfluous.

gcc/

* gcov-io.h (gcov_write): Declare.
* gcov-io.c (gcov_write): New.
(gcov_write_counter): Remove.
(gcov_write_tag_length): Likewise.
(gcov_write_summary): Replace gcov_write_tag_length() with calls to
gcov_write_unsigned().
* doc/invoke.texi (fprofile-info-section): Mention
__gcov_info_to_gdca().

gcc/testsuite/

* gcc.dg/gcov-info-to-gcda.c: New test.

libgcc/

* Makefile.in (LIBGCOV_DRIVER): Add _gcov_info_to_gcda.
* gcov.h (gcov_info): Declare.
(__gcov_info_to_gdca): Likewise.
* libgcov.h (gcov_write_counter): Remove.
(gcov_write_tag_length): Likewise.
* libgcov-driver.c (#include ): New.
(#include ): Remove.
(NEED_L_GCOV): Conditionally define.
(NEED_L_GCOV_INFO_TO_GCDA): Likewise.
(are_all_counters_zero): New.
(gcov_dump_handler): Likewise.
(gcov_allocate_handler): Likewise.
(dump_unsigned): Likewise.
(dump_counter): Likewise.
(write_topn_counters): Add dump_fn, allocate_fn, and arg parameters.
Use dump_unsigned() and dump_counter().
(write_one_data): Add dump_fn, allocate_fn, and arg parameters.  Use
dump_unsigned(), dump_counter(), and are_all_counters_zero().
(__gcov_info_to_gcda): New.
---
 gcc/doc/invoke.texi  |  80 +++--
 gcc/gcov-io.c|  36 ++---
 gcc/gcov-io.h|   1 +
 gcc/testsuite/gcc.dg/gcov-info-to-gcda.c |  60 +++
 libgcc/Makefile.in   |   2 +-
 libgcc/gcov.h|  19 +++
 libgcc/libgcov-driver.c  | 196 ++-
 libgcc/libgcov.h |   5 -
 8 files changed, 313 insertions(+), 86 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gcov-info-to-gcda.c

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 32697e6117c0..5f31312b9485 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14798,17 +14798,17 @@ To optimize the program based on the collected 
profile information, use
 Register the profile information in the specified section instead of using a
 constructor/destructor.  The section name is @var{name} if it is specified,
 otherwise the section name defaults to @code{.gcov_info}.  A pointer to the
-profile information generated by @option{-fprofile-arcs} or
-@option{-ftest-coverage} is placed in the specified section for each
-translation unit.  This option disables the profile information registration
-through a constructor and it disables the profile information processing
-through a destructor.  This option is not intended to be used in hosted
-environments such as GNU/Linux.  It targets systems with limited resources
-which do not support constructors and destructors.  The linker could collect
-the input sections in a continuous memory block and define start and end
-symbols.  The runtime support could dump the profiling information registered
-in this linker set during program termination to a serial line for example.  A
-GNU linker script example which defines a linker output section follows:
+profile information generated by @option{-fprofile-arcs} is placed in the
+specified section for each translation unit.  This option disables the profile
+information registration through a constructor and it disables the profile
+information processing through a destructor.  This option is not intended to be
+used in hosted environments such as GNU/Linux.  It targets free-standing
+environments (for example embedded systems) with limited resources which do not
+support constructors/destructors or the C library file I/O.
+
+The linker could collect the input sections in a continuous memory block and
+d

[PATCH 8/8] aarch64: Use memcpy to copy vector tables in vst1[q]_x4 intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
using a union in each of the vst1[q]_x4 Neon intrinsics in arm_neon.h.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x4 intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-21  Jonathan Wright  

* config/aarch64/arm_neon.h (vst1_s8_x4): Use
__builtin_memcpy instead of using a union.
(vst1q_s8_x4): Likewise.
(vst1_s16_x4): Likewise.
(vst1q_s16_x4): Likewise.
(vst1_s32_x4): Likewise.
(vst1q_s32_x4): Likewise.
(vst1_u8_x4): Likewise.
(vst1q_u8_x4): Likewise.
(vst1_u16_x4): Likewise.
(vst1q_u16_x4): Likewise.
(vst1_u32_x4): Likewise.
(vst1q_u32_x4): Likewise.
(vst1_f16_x4): Likewise.
(vst1q_f16_x4): Likewise.
(vst1_f32_x4): Likewise.
(vst1q_f32_x4): Likewise.
(vst1_p8_x4): Likewise.
(vst1q_p8_x4): Likewise.
(vst1_p16_x4): Likewise.
(vst1q_p16_x4): Likewise.
(vst1_s64_x4): Likewise.
(vst1_u64_x4): Likewise.
(vst1_p64_x4): Likewise.
(vst1q_s64_x4): Likewise.
(vst1q_u64_x4): Likewise.
(vst1q_p64_x4): Likewise.
(vst1_f64_x4): Likewise.
(vst1q_f64_x4): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.


rb14697.patch
Description: rb14697.patch


RE: [PATCH 7/8] aarch64: Use memcpy to copy vector tables in vst2[q] intrinsics

2021-07-23 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 23 July 2021 10:38
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH 7/8] aarch64: Use memcpy to copy vector tables in vst2[q]
> intrinsics
> 
> Hi,
> 
> This patch uses __builtin_memcpy to copy vector structures instead of
> building a new opaque structure one vector at a time in each of the
> vst2[q] Neon intrinsics in arm_neon.h. This simplifies the header file
> and also improves code generation - superfluous move instructions
> were emitted for every register extraction/set in this additional
> structure.
> 
> Add new code generation tests to verify that superfluous move
> instructions are no longer generated for the vst2q intrinsics.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-07-21  Jonathan Wrightt  
> 
>   * config/aarch64/arm_neon.h (vst2_s64): Use __builtin_memcpy
>   instead of constructing __builtin_aarch64_simd_oi one vector
>   at a time.
>   (vst2_u64): Likewise.
>   (vst2_f64): Likewise.
>   (vst2_s8): Likewise.
>   (vst2_p8): Likewise.
>   (vst2_s16): Likewise.
>   (vst2_p16): Likewise.
>   (vst2_s32): Likewise.
>   (vst2_u8): Likewise.
>   (vst2_u16): Likewise.
>   (vst2_u32): Likewise.
>   (vst2_f16): Likewise.
>   (vst2_f32): Likewise.
>   (vst2_p64): Likewise.
>   (vst2q_s8): Likewise.
>   (vst2q_p8): Likewise.
>   (vst2q_s16): Likewise.
>   (vst2q_p16): Likewise.
>   (vst2q_s32): Likewise.
>   (vst2q_s64): Likewise.
>   (vst2q_u8): Likewise.
>   (vst2q_u16): Likewise.
>   (vst2q_u32): Likewise.
>   (vst2q_u64): Likewise.
>   (vst2q_f16): Likewise.
>   (vst2q_f32): Likewise.
>   (vst2q_f64): Likewise.
>   (vst2q_p64): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
>   tests.



Re: [PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34] intrinsics

2021-07-23 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> I haven't added test cases here because these intrinsics don't map to
> a single instruction (they're legacy from Armv7) and would trip the
> "scan-assembler not mov" that we're using for the other tests.

Ah, OK, fair enough.  Thanks for the explanation.

Richard

>
> Jonathan
> ---
> From: Richard Sandiford 
> Sent: 23 July 2021 10:29
> To: Kyrylo Tkachov 
> Cc: Jonathan Wright ; gcc-patches@gcc.gnu.org
> 
> Subject: Re: [PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34]
> intrinsics
>  
> Kyrylo Tkachov  writes:
>>> -Original Message-
>>> From: Jonathan Wright 
>>> Sent: 23 July 2021 09:30
>>> To: gcc-patches@gcc.gnu.org
>>> Cc: Kyrylo Tkachov ; Richard Sandiford
>>> 
>>> Subject: [PATCH 3/8] aarch64: Use memcpy to copy vector tables in vtbl[34]
>>> intrinsics
>>>
>>> Hi,
>>>
>>> This patch uses __builtin_memcpy to copy vector structures instead of
>>> building a new opaque structure one vector at a time in each of the
>>> vtbl[34] Neon intrinsics in arm_neon.h. This simplifies the header file
>>> and also improves code generation - superfluous move instructions
>>> were emitted for every register extraction/set in this additional
>>> structure.
>>>
>>> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
>>> issues.
>>>
>>> Ok for master?
>>
>> Ok.
>
> Please add testcases first though. :-)
>
> Thanks,
> Richard


Re: [PATCH 4/8] aarch64: Use memcpy to copy vector tables in vtbx4 intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Same explanation as for patch 3/8:

I haven't added test cases here because these intrinsics don't map to
a single instruction (they're legacy from Armv7) and would trip the
"scan-assembler not mov" that we're using for the other tests.

Thanks,
Jonathan

From: Richard Sandiford 
Sent: 23 July 2021 10:31
To: Kyrylo Tkachov 
Cc: Jonathan Wright ; gcc-patches@gcc.gnu.org 

Subject: Re: [PATCH 4/8] aarch64: Use memcpy to copy vector tables in vtbx4 
intrinsics

Kyrylo Tkachov  writes:
>> -Original Message-
>> From: Jonathan Wright 
>> Sent: 23 July 2021 10:15
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov ; Richard Sandiford
>> 
>> Subject: [PATCH 4/8] aarch64: Use memcpy to copy vector tables in vtbx4
>> intrinsics
>>
>> Hi,
>>
>> This patch uses __builtin_memcpy to copy vector structures instead of
>> building a new opaque structure one vector at a time in each of the
>> vtbx4 Neon intrinsics in arm_neon.h. This simplifies the header file
>> and also improves code generation - superfluous move instructions
>> were emitted for every register extraction/set in this additional
>> structure.
>>
>> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
>> issues.
>>
>> Ok for master?
>
> Ok.

Here too I think we want some testcases…

Thanks,
Richard


RE: [PATCH 8/8] aarch64: Use memcpy to copy vector tables in vst1[q]_x4 intrinsics

2021-07-23 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 23 July 2021 10:42
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH 8/8] aarch64: Use memcpy to copy vector tables in
> vst1[q]_x4 intrinsics
> 
> Hi,
> 
> This patch uses __builtin_memcpy to copy vector structures instead of
> using a union in each of the vst1[q]_x4 Neon intrinsics in arm_neon.h.
> 
> Add new code generation tests to verify that superfluous move
> instructions are not generated for the vst1q_x4 intrinsics.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok, good to see that this approach avoids the superfluous moves.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-07-21  Jonathan Wright  
> 
>   * config/aarch64/arm_neon.h (vst1_s8_x4): Use
>   __builtin_memcpy instead of using a union.
>   (vst1q_s8_x4): Likewise.
>   (vst1_s16_x4): Likewise.
>   (vst1q_s16_x4): Likewise.
>   (vst1_s32_x4): Likewise.
>   (vst1q_s32_x4): Likewise.
>   (vst1_u8_x4): Likewise.
>   (vst1q_u8_x4): Likewise.
>   (vst1_u16_x4): Likewise.
>   (vst1q_u16_x4): Likewise.
>   (vst1_u32_x4): Likewise.
>   (vst1q_u32_x4): Likewise.
>   (vst1_f16_x4): Likewise.
>   (vst1q_f16_x4): Likewise.
>   (vst1_f32_x4): Likewise.
>   (vst1q_f32_x4): Likewise.
>   (vst1_p8_x4): Likewise.
>   (vst1q_p8_x4): Likewise.
>   (vst1_p16_x4): Likewise.
>   (vst1q_p16_x4): Likewise.
>   (vst1_s64_x4): Likewise.
>   (vst1_u64_x4): Likewise.
>   (vst1_p64_x4): Likewise.
>   (vst1q_s64_x4): Likewise.
>   (vst1q_u64_x4): Likewise.
>   (vst1q_p64_x4): Likewise.
>   (vst1_f64_x4): Likewise.
>   (vst1q_f64_x4): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
>   tests.



[PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num

2021-07-23 Thread Chung-Lin Tang

Hi all,
this patch implements the omp_get_device_num API function, which appears
to be a missing piece in the library routines implementation.

The host-side implementation is simple, which by specification is equivalent
to omp_get_initial_device.

Inside offloaded regions, the preferred way to should be that the device
already has this information initialized (once) when the device is initialized.
And the function merely returns the stored value.

This implementation adds a convention for an additional entry (dubbed under 
'others'
in the code) returned by the 'load_image' plugin hook. Basically we define
a variable name in libgomp-plugin.h, which the device libgomp defines, and the
offload plugin searches for, and returns the variable device location start/end 
for
gomp_load_image_from_device to initialize. The device-side omp_get_device_num
then just returns that value.

This patch implements for gcn and nvptx offload targets. The icv-device.c file 
is
starting to look like a file ready to consolidate away the target specific 
versions,
but that's for later.

Basic libgomp tests were added for C/C++ and Fortran. Tested without regressions
with offloading for amdgcn and nvptx on x86_64-linux host. Okay for trunk?

Thanks,
Chung-Lin

2021-07-23  Chung-Lin Tang  

libgomp/ChangeLog

* icv-device.c (omp_get_device_num): New API function, host side.
* fortran.c (omp_get_device_num_): New interface function.
* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
* libgomp.map (OMP_5.0.1): Add omp_get_device_num, omp_get_device_num_.
* libgomp.texi (omp_get_device_num): Add documentation for new API
function.
* omp.h.in (omp_get_device_num): Add declaration.
* omp_lib.f90.in (omp_get_device_num): Likewise.
* omp_lib.h.in (omp_get_device_num): Likewise.
* target.c (gomp_load_image_to_device): If additional entry for device
number exists at end of returned entries from 'load_image_func' hook,
copy the assigned device number over to the device variable.

* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* config/plugin/plugin-gcn.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.

* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* config/plugin/plugin-nvptx.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.

* testsuite/libgomp.c-c++-common/target-45.c: New test.
* testsuite/libgomp.fortran/target10.f90: New test.


diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index 72d4f7cff74..8f72028a6c8 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -70,6 +70,16 @@ omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device 
initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
diff --git a/libgomp/config/nvptx/icv-device.c 
b/libgomp/config/nvptx/icv-device.c
index 3b96890f338..e586da1d3a8 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -58,8 +58,19 @@ omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device 
initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 4ec39c4e61b..2360582e32e 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -598,6 +598,12 @@ omp_get_initial_device_ (void)
   return omp_get_initial_device ();
 }
 
+int32_t
+omp_get_device_num_ (void)
+{
+  return omp_get_device_num ();
+}
+
 int32_t
 omp_get_max_task_priority_ (void)
 {
diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c
index c1bedf46647..f11bdfa85c4 100644
--- a/libgomp/icv-device.c
+++ b/libgomp/icv-device.c
@@ -61,8 +61,17 @@ omp_is_initial_device (void)
   return 1;
 }
 
+int
+omp_get_device_num (void)
+{
+  /* By specification, this is equivalent to omp_get_initial_device
+ on the host.  */
+  return omp_get_initial_device ();
+}
+
 ialias (omp_set_default_devic

Re: [PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num

2021-07-23 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 23, 2021 at 06:21:41PM +0800, Chung-Lin Tang wrote:
> --- a/libgomp/icv-device.c
> +++ b/libgomp/icv-device.c
> @@ -61,8 +61,17 @@ omp_is_initial_device (void)
>return 1;
>  }
>  
> +int
> +omp_get_device_num (void)
> +{
> +  /* By specification, this is equivalent to omp_get_initial_device
> + on the host.  */
> +  return omp_get_initial_device ();
> +}
> +

I think this won't work properly with the intel micoffload, where the host
libgomp is used in the offloaded code.
For omp_is_initial_device, the plugin solves it by:
liboffloadmic/plugin/offload_target_main.cpp
overriding it:
/* Override the corresponding functions from libgomp.  */
extern "C" int
omp_is_initial_device (void) __GOMP_NOTHROW
{
  return 0;
}
   
extern "C" int32_t
omp_is_initial_device_ (void)
{
  return omp_is_initial_device ();
}
but guess it will need slightly more work because we need to copy the value
to the offloading device too.
It can be done incrementally though.

> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> @@ -102,6 +102,12 @@ struct addr_pair
>uintptr_t end;
>  };
>  
> +/* This symbol is to name a target side variable that holds the designated
> +   'device number' of the target device. The symbol needs to be available to
> +   libgomp code and the  offload plugin (which in the latter case must be
> +   stringified).  */
> +#define GOMP_DEVICE_NUM_VAR __gomp_device_num

For a single var it is acceptable (though, please avoid the double space
before offload plugin in the comment), but once we have more than one
variable, I think we should simply have a struct which will contain all the
parameters that need to be copied from the host to the offloading device at
image load time (and have eventually another struct that holds parameters
that we'll need to copy to the device on each kernel launch, I bet some ICVs
will be one category, other ICVs another one).

> diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
> index 8ea27b5565f..ffcb98ae99e 100644
> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map
> @@ -197,6 +197,8 @@ OMP_5.0.1 {
>   omp_get_supported_active_levels_;
>   omp_fulfill_event;
>   omp_fulfill_event_;
> + omp_get_device_num;
> + omp_get_device_num_;
>  } OMP_5.0;

This is wrong.  We've already released GCC 11.1 with the OMP_5.0.1
symbol version, so we must not add any further symbols into that symbol
version.  OpenMP 5.0 routines added in GCC 12 should be OMP_5.0.2 symbol
version.

Jakub



Re: [ARM] PR66791: Replace builtins in vshl_n

2021-07-23 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 23 Jul 2021 at 15:02, Richard Earnshaw
 wrote:
>
> On 23/07/2021 08:04, Prathamesh Kulkarni via Gcc-patches wrote:
> > On Thu, 22 Jul 2021 at 20:29, Richard Earnshaw
> >  wrote:
> >>
> >>
> >>
> >> On 22/07/2021 14:47, Prathamesh Kulkarni via Gcc-patches wrote:
> >>> On Thu, 22 Jul 2021 at 17:28, Richard Earnshaw
> >>>  wrote:
> 
> 
> 
>  On 22/07/2021 12:32, Prathamesh Kulkarni wrote:
> > On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
> >  wrote:
> >>
> >>
> >>
> >> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
> >>> Hi,
> >>> The attached patch removes calls to builtins from vshl_n intrinsics,
> >>> and replacing them
> >>> with left shift operator. The patch passes bootstrap+test on
> >>> arm-linux-gnueabihf.
> >>>
> >>> Altho, I noticed, that the patch causes 3 extra registers to spill
> >>> using << instead
> >>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of
> >>> intrinsics ?
> >>> Before patch, the shift operation was performed by call to
> >>> __builtin_neon_vshl (__a, __b)
> >>> and now it's inlined to __a << __b, which might result in increased
> >>> register pressure ?
> >>>
> >>> Thanks,
> >>> Prathamesh
> >>>
> >>
> >>
> >> You're missing a ChangeLog for the patch.
> > Sorry, updated in this patch.
> >>
> >> However, I'm not sure about this.  The register shift form of VSHL
> >> performs a right shift if the value is negative, which is UB if you
> >> write `<<` instead.
> >>
> >> Have I missed something here?
> > Hi Richard,
> > According to this article:
> > https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
> > For vshl_n, the shift amount is always in the non-negative range for 
> > all types.
> >
> > I tried using vshl_n_s32 (a, -1), and the compiler emitted following 
> > diagnostic:
> > foo.c: In function ‘main’:
> > foo.c:17:1: error: constant -1 out of range 0 - 31
> >  17 | }
> > | ^
> >
> 
>  It does do that now, but that's because the intrinsic expansion does
>  some bounds checking; when you remove the call into the back-end
>  intrinsic that will no-longer happen.
> 
>  I think with this change various things are likely:
> 
>  - We'll no-longer reject non-immediate values, so users will be able to
>  write
> 
>    int b = 5;
>   vshl_n_s32 (a, b);
> 
>  which will expand to a vdup followed by the register form.
> 
>  - we'll rely on the front-end diagnosing out-of range shifts
> 
>  - code of the form
> 
>   int b = -1;
>   vshl_n_s32 (a, b);
> 
>  will probably now go through without any errors, especially at low
>  optimization levels.  It may end up doing what the user wanted, but it's
>  definitely a change in behaviour - and perhaps worse, the compiler might
>  diagnose the above as UB and silently throw some stuff away.
> 
>  It might be that we need to insert some form of static assertion that
>  the second argument is a __builtin_constant_p().
> >>> Ah right, thanks for the suggestions!
> >>> I tried the above example:
> >>> int b = -1;
> >>> vshl_n_s32 (a, b);
> >>> and it compiled without any errors with -O0 after patch.
> >>>
> >>> Would it be OK to use _Static_assert (__builtin_constant_p (b)) to
> >>> guard against non-immediate values ?
> >>>
> >>> With the following change:
> >>> __extension__ extern __inline int32x2_t
> >>> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> >>> vshl_n_s32 (int32x2_t __a, const int __b)
> >>> {
> >>>_Static_assert (__builtin_constant_p (__b));
> >>>return __a << __b;
> >>> }
> >>>
> >>> the above example fails at -O0:
> >>> ../armhf-build/gcc/include/arm_neon.h: In function ‘vshl_n_s32’:
> >>> ../armhf-build/gcc/include/arm_neon.h:4904:3: error: static assertion 
> >>> failed
> >>>   4904 |   _Static_assert (__builtin_constant_p (__b));
> >>>|   ^~
> >>
> >> I've been playing with that but unfortunately it doesn't seem to work in
> >> the way we want it to.  For a complete test:
> >>
> >>
> >>
> >> typedef __simd64_int32_t int32x2_t;
> >>
> >> __extension__ extern __inline int32x2_t
> >> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> >> vshl_n_s32 (int32x2_t __a, const int __b)
> >> {
> >>_Static_assert (__builtin_constant_p (__b), "Second argument must be
> >> a litteral constant");
> >>return __a << __b;
> >> }
> >>
> >> int32x2_t f (int32x2_t x, const int b)
> >> {
> >>return vshl_n_s32 (x, 1);
> >> }
> >>
> >> At -O0 I get:
> >>
> >> test.c: In function ‘vshl_n_s32’:
> >> test.c:7:3: error: static assertion failed: "Second argument must be a
> >> litteral constant"
> >>  7 |   _Static_assert

[committed] libstdc++: Update documentation comments for namespace rel_ops

2021-07-23 Thread Jonathan Wakely via Gcc-patches
The comments in  describe problems that were solved
years ago (for GCC 3.1). The comparison operators in  are no
longer ambiguous with the rel_ops ones, so the linked mailing list
thread and FAQ entry aren't relevant now. The reference to std_utility.h
is also outdated as it's just called utility now, both in the source
tree and when installed.

The use of rel_ops is still frowned upon though, so replace the
discussion of ambiguities within libstdc++ headers with adminition about
using rel_ops in user code.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/stl_relops.h: Update documentation comments.

Tested powerpc64le-linux. Committed to trunk.

commit 5b965dc49a6a4293ce85bc3a24ca3f3855469e68
Author: Jonathan Wakely 
Date:   Fri Jul 23 11:03:23 2021

libstdc++: Update documentation comments for namespace rel_ops

The comments in  describe problems that were solved
years ago (for GCC 3.1). The comparison operators in  are no
longer ambiguous with the rel_ops ones, so the linked mailing list
thread and FAQ entry aren't relevant now. The reference to std_utility.h
is also outdated as it's just called utility now, both in the source
tree and when installed.

The use of rel_ops is still frowned upon though, so replace the
discussion of ambiguities within libstdc++ headers with adminition about
using rel_ops in user code.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/stl_relops.h: Update documentation comments.

diff --git a/libstdc++-v3/include/bits/stl_relops.h 
b/libstdc++-v3/include/bits/stl_relops.h
index 276894c435a..ef522031318 100644
--- a/libstdc++-v3/include/bits/stl_relops.h
+++ b/libstdc++-v3/include/bits/stl_relops.h
@@ -52,13 +52,8 @@
  *  This is an internal header file, included by other library headers.
  *  Do not attempt to use it directly. @headername{utility}
  *
- *  Inclusion of this file has been removed from
- *  all of the other STL headers for safety reasons, except std_utility.h.
- *  For more information, see the thread of about twenty messages starting
- *  with http://gcc.gnu.org/ml/libstdc++/2001-01/msg00223.html, or
- *  http://gcc.gnu.org/onlinedocs/libstdc++/faq.html#faq.ambiguous_overloads
- *
- *  Short summary: the rel_ops operators should be avoided for the present.
+ *  This file is only included by ``, which is required by the
+ *  standard to define namespace `rel_ops` and its contents.
  */
 
 #ifndef _STL_RELOPS_H
@@ -72,6 +67,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
 /** @namespace std::rel_ops
  *  @brief  The generated relational operators are sequestered here.
+ *
+ *  Libstdc++ headers must not use the contents of `rel_ops`.
+ *  User code should also avoid them, because unconstrained function
+ *  templates are too greedy and can easily cause ambiguities.
+ *
+ *  C++20 default comparisons are a better solution.
  */
 
 /**


Re: [PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num

2021-07-23 Thread Tobias Burnus

On 23.07.21 12:21, Chung-Lin Tang wrote:

Inside offloaded regions, the preferred way to should be that the device
already has this information initialized (once) when the device is
initialized.
And the function merely returns the stored value.

...

+++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } } 
*/

...

+  int device_num;
+  #pragma omp target map(from: device_num)
+  {
+device_num = omp_get_device_num ();
+  }
+
+  if (host_device_num == device_num)
+abort ();


I personally prefer having:
   int initial_dev;
and inside 'omp target' (with 'map(from:initial_dev)'):
   initial_device = omp_is_initial_device();

Then the check would be:
  if (initial_device && host_device_num != device_num)
abort();
  if (!initial_device && host_device_num == device_num)
abort();

(Likewise for Fortran.)

And instead of restricting the target to nvptx/gcn, we could just add
dg-xfail-run-if for *-intelmic-* and *-intelmicemul-*.

Additionally, offload_target_nvptx/...amdgcn only check whether
compilation support is available not whether a device exists
at run time.
(The device availability is checked by target_offload_device,
using omp_is_initial_device().)

Tobias

PS: For completeness, I want to note that OpenMP 5.1 supports
setting the per-device ICV as via the environment variables,
besides inheriting the generic ICV values, device-specific
settings are possible with:
  _DEV[_]
Thus, more data will be passed from libgomp to the plugins
in the future.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH v2] gcov: Add __gcov_info_to_gdca()

2021-07-23 Thread Sebastian Huber

On 23/07/2021 09:16, Martin Liška wrote:

On 7/23/21 9:06 AM, Sebastian Huber wrote:

On 23/07/2021 08:52, Martin Liška wrote:

+#ifdef NEED_L_GCOV_INFO_TO_GCDA
+/* Convert the gcov info to a gcda data stream.  It is intended for
+   free-standing environments which do not support the C library 
file I/O.  */

+
+void
+__gcov_info_to_gcda (const struct gcov_info *gi_ptr,
+ void (*filename) (const char *, void *),


What about begin_finaname_fn?


+ void (*dump) (const void *, unsigned, void *),
+ void *(*allocate) (unsigned, void *),
+ void *arg)
+{
+  (*filename) (gi_ptr->filename, arg);
+  write_one_data (gi_ptr, NULL, dump, allocate, arg);
+}
+#endif /* NEED_L_GCOV_INFO_TO_GCDA */



About gcov_write_summary: it should be also dumped in order to have 
a complete .gcda file, right?


How can I get access to the summary information? Here it is not 
available:


You only need to change gcov_write_summary in gcov-io.c.


Sorry, I still don't know how I can get the summary information if I 
only have a pointer to the gcov_info structure which does not contain 
a summary member.


You're right, sorry! But in your case, it will be simple to re-created 
it by the script at a host system.



gcov_write_summary (gcov_unsigned_t tag, const struct gcov_summary 
*summary)

{
   gcov_write_tag_length (tag, GCOV_TAG_SUMMARY_LENGTH);
   gcov_write_unsigned (summary->runs);
   gcov_write_unsigned (summary->sum_max);
}

Where summary->runs will be 1 and sum_max is maximum counter during the 
run.


This __gcov_info_to_gcda() is just a low-level piece which is necessary 
to get the gcda stream from an embedded system to a host without having 
to know the details of gcov. In follow up patches we could think about a 
standard format to serialize the gcda stream.  For this format we could 
add support to the host gcov tools. One of the tools could read this 
stream and output proper *.gcda files.


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


[committed] libstdc++: Reduce headers included by

2021-07-23 Thread Jonathan Wakely via Gcc-patches
The  header only needs std::atomic_flag, so can include
 instead of the whole of .

libstdc++-v3/ChangeLog:

* include/std/future: Include  instead of
.

Tested powerpc64le-linux. Committed to trunk.

commit 3ea62a2b2ed739209936e0ed27539965ae4c9840
Author: Jonathan Wakely 
Date:   Fri Jul 23 12:32:05 2021

libstdc++: Reduce headers included by 

The  header only needs std::atomic_flag, so can include
 instead of the whole of .

libstdc++-v3/ChangeLog:

* include/std/future: Include  instead of
.

diff --git a/libstdc++-v3/include/std/future b/libstdc++-v3/include/std/future
index 09e54c3703b..ace0c311f1a 100644
--- a/libstdc++-v3/include/std/future
+++ b/libstdc++-v3/include/std/future
@@ -38,9 +38,10 @@
 #include  // call_once
 #include  // __at_thread_exit_elt
 #include 
-#include 
+#include  // atomic_flag
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 


PING^4: [PATCH] mips: add MSA vec_cmp and vec_cmpu expand pattern [PR101132]

2021-07-23 Thread Xi Ruoyao via Gcc-patches
Ping again.

On Mon, 2021-06-21 at 21:42 +0800, Xi Ruoyao wrote:
> Middle-end started to emit vec_cmp and vec_cmpu since GCC 11, causing
> ICE on MIPS with MSA enabled.  Add the pattern to prevent it.
> 
> Bootstrapped and regression tested on mips64el-linux-gnu.
> Ok for trunk?
> 
> gcc/
> 
> * config/mips/mips-protos.h (mips_expand_vec_cmp_expr): Declare.
> * config/mips/mips.c (mips_expand_vec_cmp_expr): New function.
> * config/mips/mips-msa.md (vec_cmp): New
>   expander.
>   (vec_cmpu): New expander.
> ---
>  gcc/config/mips/mips-msa.md   | 22 ++
>  gcc/config/mips/mips-protos.h |  1 +
>  gcc/config/mips/mips.c    | 11 +++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
> index 3ecf2bde19f..3a67f25be56 100644
> --- a/gcc/config/mips/mips-msa.md
> +++ b/gcc/config/mips/mips-msa.md
> @@ -435,6 +435,28 @@
>    DONE;
>  })
>  
> +(define_expand "vec_cmp"
> +  [(match_operand: 0 "register_operand")
> +   (match_operator 1 ""
> + [(match_operand:MSA 2 "register_operand")
> +  (match_operand:MSA 3 "register_operand")])]
> +  "ISA_HAS_MSA"
> +{
> +  mips_expand_vec_cmp_expr (operands);
> +  DONE;
> +})
> +
> +(define_expand "vec_cmpu"
> +  [(match_operand: 0 "register_operand")
> +   (match_operator 1 ""
> + [(match_operand:IMSA 2 "register_operand")
> +  (match_operand:IMSA 3 "register_operand")])]
> +  "ISA_HAS_MSA"
> +{
> +  mips_expand_vec_cmp_expr (operands);
> +  DONE;
> +})
> +
>  (define_insn "msa_insert_"
>    [(set (match_operand:MSA 0 "register_operand" "=f,f")
> (vec_merge:MSA
> diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
> index 2cf4ed50292..a685f7f7dd5 100644
> --- a/gcc/config/mips/mips-protos.h
> +++ b/gcc/config/mips/mips-protos.h
> @@ -385,6 +385,7 @@ extern mulsidi3_gen_fn mips_mulsidi3_gen_fn (enum 
> rtx_code);
>  
>  extern void mips_register_frame_header_opt (void);
>  extern void mips_expand_vec_cond_expr (machine_mode, machine_mode, rtx *);
> +extern void mips_expand_vec_cmp_expr (rtx *);
>  
>  /* Routines implemented in mips-d.c  */
>  extern void mips_d_target_versions (void);
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 00a8eef96aa..8f043399a8e 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -22321,6 +22321,17 @@ mips_expand_msa_cmp (rtx dest, enum rtx_code cond, 
> rtx op0, rtx op1)
>  }
>  }
>  
> +void
> +mips_expand_vec_cmp_expr (rtx *operands)
> +{
> +  rtx cond = operands[1];
> +  rtx op0 = operands[2];
> +  rtx op1 = operands[3];
> +  rtx res = operands[0];
> +
> +  mips_expand_msa_cmp (res, GET_CODE (cond), op0, op1);
> +}
> +
>  /* Expand VEC_COND_EXPR, where:
>     MODE is mode of the result
>     VIMODE equivalent integer mode

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University



[PATCH] IBM Z: Fix 5 tests in 31-bit mode

2021-07-23 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



gcc/testsuite/ChangeLog:

* gcc.target/s390/global-array-element-pic2.c: Add -mzarch, add
an expectation for 31-bit mode.
* gcc.target/s390/load-imm64-1.c: Use unsigned long long.
* gcc.target/s390/load-imm64-2.c: Likewise.
* gcc.target/s390/vector/long-double-vx-macro-off-on.c: Use
-mzarch.
* gcc.target/s390/vector/long-double-vx-macro-on-off.c:
Likewise.
---
 gcc/testsuite/gcc.target/s390/global-array-element-pic2.c| 5 +++--
 gcc/testsuite/gcc.target/s390/load-imm64-1.c | 4 ++--
 gcc/testsuite/gcc.target/s390/load-imm64-2.c | 4 ++--
 .../gcc.target/s390/vector/long-double-vx-macro-off-on.c | 2 +-
 .../gcc.target/s390/vector/long-double-vx-macro-on-off.c | 2 +-
 5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c 
b/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c
index 72b87d40b85..0ee10841cac 100644
--- a/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c
+++ b/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c
@@ -1,6 +1,6 @@
 /* Test accesses to global array elements in PIC code.  */
 /* { dg-do compile } */
-/* { dg-options "-O1 -march=z10 -fPIC" } */
+/* { dg-options "-O1 -march=z10 -mzarch -fPIC" } */
 
 extern char a[] __attribute__ ((aligned (2)));
 extern char *b;
@@ -8,6 +8,7 @@ extern char *b;
 void c()
 {
   b = a + 4;
-  /* { dg-final { scan-assembler "(?n)\n\tlgrl\t%r\\d+,a@GOTENT\n" } } */
+  /* { dg-final { scan-assembler "(?n)\n\tlgrl\t%r\\d+,a@GOTENT\n" { target 
lp64 } } } */
+  /* { dg-final { scan-assembler "(?n)\n\tlrl\t%r\\d+,a@GOTENT\n" { target { ! 
lp64 } } } } */
   /* { dg-final { scan-assembler-not "(?n)\n\tlarl\t%r\\d+,a\[^@\]" } } */
 }
diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-1.c 
b/gcc/testsuite/gcc.target/s390/load-imm64-1.c
index 03d17f59096..8e812f2f01d 100644
--- a/gcc/testsuite/gcc.target/s390/load-imm64-1.c
+++ b/gcc/testsuite/gcc.target/s390/load-imm64-1.c
@@ -4,10 +4,10 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=z9-109" } */
 
-unsigned long
+unsigned long long
 magic (void)
 {
-  return 0x3f08c5392f756cd;
+  return 0x3f08c5392f756cdULL;
 }
 
 /* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-2.c 
b/gcc/testsuite/gcc.target/s390/load-imm64-2.c
index ee0ff3b0a91..c3536b4d031 100644
--- a/gcc/testsuite/gcc.target/s390/load-imm64-2.c
+++ b/gcc/testsuite/gcc.target/s390/load-imm64-2.c
@@ -4,10 +4,10 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=z10" } */
 
-unsigned long
+unsigned long long
 magic (void)
 {
-  return 0x3f08c5392f756cd;
+  return 0x3f08c5392f756cdULL;
 }
 
 /* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
index 2d67679bb11..513912e669d 100644
--- a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target target_attribute } */
-/* { dg-options "-march=z14" } */
+/* { dg-options "-march=z14 -mzarch" } */
 #if !defined(__LONG_DOUBLE_VX__)
 #error
 #endif
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c
index 6f264313408..6b3cb321338 100644
--- a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target target_attribute } */
-/* { dg-options "-march=z13" } */
+/* { dg-options "-march=z13 -mzarch" } */
 #if defined(__LONG_DOUBLE_VX__)
 #error
 #endif
-- 
2.31.1



PING^4: [PATCH] mips: Fix up mips_atomic_assign_expand_fenv [PR94780]

2021-07-23 Thread Xi Ruoyao via Gcc-patches
Ping again.

On Wed, 2021-06-23 at 11:11 +0800, Xi Ruoyao wrote:
> Commit message shamelessly copied from 1777beb6b129 by jakub:
> 
> This function, because it is sometimes called even outside of function
> bodies, uses create_tmp_var_raw rather than create_tmp_var.  But in order
> for that to work, when first referenced, the VAR_DECLs need to appear in a
> TARGET_EXPR so that during gimplification the var gets the right
> DECL_CONTEXT and is added to local decls.
> 
> Bootstrapped & regtested on mips64el-linux-gnu.  Ok for trunk and backport
> to 11, 10, and 9?
> 
> gcc/
> 
> * config/mips/mips.c (mips_atomic_assign_expand_fenv): Use
>   TARGET_EXPR instead of MODIFY_EXPR.
> ---
>  gcc/config/mips/mips.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 8f043399a8e..89d1be6cea6 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -22439,12 +22439,12 @@ mips_atomic_assign_expand_fenv (tree *hold, tree 
> *clear, tree *update)
>    tree get_fcsr = mips_builtin_decls[MIPS_GET_FCSR];
>    tree set_fcsr = mips_builtin_decls[MIPS_SET_FCSR];
>    tree get_fcsr_hold_call = build_call_expr (get_fcsr, 0);
> -  tree hold_assign_orig = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
> - fcsr_orig_var, get_fcsr_hold_call);
> +  tree hold_assign_orig = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
> + fcsr_orig_var, get_fcsr_hold_call, NULL, 
> NULL);
>    tree hold_mod_val = build2 (BIT_AND_EXPR, MIPS_ATYPE_USI, fcsr_orig_var,
>   build_int_cst (MIPS_ATYPE_USI, 0xf003));
> -  tree hold_assign_mod = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
> -    fcsr_mod_var, hold_mod_val);
> +  tree hold_assign_mod = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
> +    fcsr_mod_var, hold_mod_val, NULL, NULL);
>    tree set_fcsr_hold_call = build_call_expr (set_fcsr, 1, fcsr_mod_var);
>    tree hold_all = build2 (COMPOUND_EXPR, MIPS_ATYPE_USI,
>   hold_assign_orig, hold_assign_mod);
> @@ -22454,8 +22454,8 @@ mips_atomic_assign_expand_fenv (tree *hold, tree 
> *clear, tree *update)
>    *clear = build_call_expr (set_fcsr, 1, fcsr_mod_var);
>  
>    tree get_fcsr_update_call = build_call_expr (get_fcsr, 0);
> -  *update = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
> -   exceptions_var, get_fcsr_update_call);
> +  *update = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
> +   exceptions_var, get_fcsr_update_call, NULL, NULL);
>    tree set_fcsr_update_call = build_call_expr (set_fcsr, 1, fcsr_orig_var);
>    *update = build2 (COMPOUND_EXPR, void_type_node, *update,
>     set_fcsr_update_call);

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University



Re: [PATCH 3/3] [PR libfortran/101305] Fix ISO_Fortran_binding.h paths in gfortran testsuite

2021-07-23 Thread Tobias Burnus

Hi Sandra,

On 21.07.21 12:17, Tobias Burnus wrote:

On 13.07.21 23:28, Sandra Loosemore wrote:

ISO_Fortran_binding.h is now generated in the libgfortran build
directory where it is on the default include path.  Adjust includes in
the gfortran testsuite not to include an explicit path pointing at the
source directory.

...

-#include "../../../libgfortran/ISO_Fortran_binding.h"
+#include "ISO_Fortran_binding.h"

Unfortunately, that does not help.


It seems as if the following works in the *.exp file:

# Flags for finding libgfortran ISO*.h files.
if [info exists TOOL_OPTIONS] {
   set specpath [get_multilibs ${TOOL_OPTIONS}]
} else {
   set specpath [get_multilibs]
}
set options "-I $specpath/libgfortran/"

I am not sure whether that should/can be added into
  gfortran.dg/dg.exp
or whether we only want to do this in
  ts29113/ts29113.exp
alias
  c-interop/interop.exp
  f18-c-interop/interop.exp
  ...

That seems to work fine with -m32 and -m64.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] aarch64: Use memcpy to copy vector tables in vst1[q]_x3 intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst1[q]_x3 Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x3 intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-23  Jonathan Wright  

* config/aarch64/arm_neon.h (vst1_s64_x3): Use
__builtin_memcpy instead of constructing
__builtin_aarch64_simd_ci one vector at a time.
(vst1_u64_x3): Likewise.
(vst1_f64_x3): Likewise.
(vst1_s8_x3): Likewise.
(vst1_p8_x3): Likewise.
(vst1_s16_x3): Likewise.
(vst1_p16_x3): Likewise.
(vst1_s32_x3): Likewise.
(vst1_u8_x3): Likewise.
(vst1_u16_x3): Likewise.
(vst1_u32_x3): Likewise.
(vst1_f16_x3): Likewise.
(vst1_f32_x3): Likewise.
(vst1_p64_x3): Likewise.
(vst1q_s8_x3): Likewise.
(vst1q_p8_x3): Likewise.
(vst1q_s16_x3): Likewise.
(vst1q_p16_x3): Likewise.
(vst1q_s32_x3): Likewise.
(vst1q_s64_x3): Likewise.
(vst1q_u8_x3): Likewise.
(vst1q_u16_x3): Likewise.
(vst1q_u32_x3): Likewise.
(vst1q_u64_x3): Likewise.
(vst1q_f16_x3): Likewise.
(vst1q_f32_x3): Likewise.
(vst1q_f64_x3): Likewise.
(vst1q_p64_x3): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.


rb14700.patch
Description: rb14700.patch


[PATCH] aarch64: Use memcpy to copy vector tables in vst1[q]_x2 intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst1[q]_x2 Neon intrinsics in arm_neon.h. This simplifies the header
file and also improves code generation - superfluous move
instructions were emitted for every register extraction/set in this
additional structure.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x2 intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-23  Jonathan Wright  

* config/aarch64/arm_neon.h (vst1_s64_x2): Use
__builtin_memcpy instead of constructing
__builtin_aarch64_simd_oi one vector at a time.
(vst1_u64_x2): Likewise.
(vst1_f64_x2): Likewise.
(vst1_s8_x2): Likewise.
(vst1_p8_x2): Likewise.
(vst1_s16_x2): Likewise.
(vst1_p16_x2): Likewise.
(vst1_s32_x2): Likewise.
(vst1_u8_x2): Likewise.
(vst1_u16_x2): Likewise.
(vst1_u32_x2): Likewise.
(vst1_f16_x2): Likewise.
(vst1_f32_x2): Likewise.
(vst1_p64_x2): Likewise.
(vst1q_s8_x2): Likewise.
(vst1q_p8_x2): Likewise.
(vst1q_s16_x2): Likewise.
(vst1q_p16_x2): Likewise.
(vst1q_s32_x2): Likewise.
(vst1q_s64_x2): Likewise.
(vst1q_u8_x2): Likewise.
(vst1q_u16_x2): Likewise.
(vst1q_u32_x2): Likewise.
(vst1q_u64_x2): Likewise.
(vst1q_f16_x2): Likewise.
(vst1q_f32_x2): Likewise.
(vst1q_f64_x2): Likewise.
(vst1q_p64_x2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.


rb14701.patch
Description: rb14701.patch


RE: [PATCH] aarch64: Use memcpy to copy vector tables in vst1[q]_x3 intrinsics

2021-07-23 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 23 July 2021 15:22
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH] aarch64: Use memcpy to copy vector tables in vst1[q]_x3
> intrinsics
> 
> Hi,
> 
> This patch uses __builtin_memcpy to copy vector structures instead of
> building a new opaque structure one vector at a time in each of the
> vst1[q]_x3 Neon intrinsics in arm_neon.h. This simplifies the header file
> and also improves code generation - superfluous move instructions
> were emitted for every register extraction/set in this additional
> structure.
> 
> Add new code generation tests to verify that superfluous move
> instructions are not generated for the vst1q_x3 intrinsics.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-07-23  Jonathan Wright  
> 
>   * config/aarch64/arm_neon.h (vst1_s64_x3): Use
>   __builtin_memcpy instead of constructing
>   __builtin_aarch64_simd_ci one vector at a time.
>   (vst1_u64_x3): Likewise.
>   (vst1_f64_x3): Likewise.
>   (vst1_s8_x3): Likewise.
>   (vst1_p8_x3): Likewise.
>   (vst1_s16_x3): Likewise.
>   (vst1_p16_x3): Likewise.
>   (vst1_s32_x3): Likewise.
>   (vst1_u8_x3): Likewise.
>   (vst1_u16_x3): Likewise.
>   (vst1_u32_x3): Likewise.
>   (vst1_f16_x3): Likewise.
>   (vst1_f32_x3): Likewise.
>   (vst1_p64_x3): Likewise.
>   (vst1q_s8_x3): Likewise.
>   (vst1q_p8_x3): Likewise.
>   (vst1q_s16_x3): Likewise.
>   (vst1q_p16_x3): Likewise.
>   (vst1q_s32_x3): Likewise.
>   (vst1q_s64_x3): Likewise.
>   (vst1q_u8_x3): Likewise.
>   (vst1q_u16_x3): Likewise.
>   (vst1q_u32_x3): Likewise.
>   (vst1q_u64_x3): Likewise.
>   (vst1q_f16_x3): Likewise.
>   (vst1q_f32_x3): Likewise.
>   (vst1q_f64_x3): Likewise.
>   (vst1q_p64_x3): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
>   tests.



RE: [PATCH] aarch64: Use memcpy to copy vector tables in vst1[q]_x2 intrinsics

2021-07-23 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 23 July 2021 15:25
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH] aarch64: Use memcpy to copy vector tables in vst1[q]_x2
> intrinsics
> 
> Hi,
> 
> This patch uses __builtin_memcpy to copy vector structures instead of
> building a new opaque structure one vector at a time in each of the
> vst1[q]_x2 Neon intrinsics in arm_neon.h. This simplifies the header
> file and also improves code generation - superfluous move
> instructions were emitted for every register extraction/set in this
> additional structure.
> 
> Add new code generation tests to verify that superfluous move
> instructions are not generated for the vst1q_x2 intrinsics.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-07-23  Jonathan Wright  
> 
>   * config/aarch64/arm_neon.h (vst1_s64_x2): Use
>   __builtin_memcpy instead of constructing
>   __builtin_aarch64_simd_oi one vector at a time.
>   (vst1_u64_x2): Likewise.
>   (vst1_f64_x2): Likewise.
>   (vst1_s8_x2): Likewise.
>   (vst1_p8_x2): Likewise.
>   (vst1_s16_x2): Likewise.
>   (vst1_p16_x2): Likewise.
>   (vst1_s32_x2): Likewise.
>   (vst1_u8_x2): Likewise.
>   (vst1_u16_x2): Likewise.
>   (vst1_u32_x2): Likewise.
>   (vst1_f16_x2): Likewise.
>   (vst1_f32_x2): Likewise.
>   (vst1_p64_x2): Likewise.
>   (vst1q_s8_x2): Likewise.
>   (vst1q_p8_x2): Likewise.
>   (vst1q_s16_x2): Likewise.
>   (vst1q_p16_x2): Likewise.
>   (vst1q_s32_x2): Likewise.
>   (vst1q_s64_x2): Likewise.
>   (vst1q_u8_x2): Likewise.
>   (vst1q_u16_x2): Likewise.
>   (vst1q_u32_x2): Likewise.
>   (vst1q_u64_x2): Likewise.
>   (vst1q_f16_x2): Likewise.
>   (vst1q_f32_x2): Likewise.
>   (vst1q_f64_x2): Likewise.
>   (vst1q_p64_x2): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
>   tests.



[PUSHED] Use range_query object in array bounds class.

2021-07-23 Thread Aldy Hernandez via Gcc-patches
Now that all dependencies of array_bounds_checker take a range_query, we
can sever the relationship with vr_values.  Changing this will allow us
to use the array_bounds_checker with VRP, evrp, or the ranger.

Tested on x86-64 Linux.

Pushed.

gcc/ChangeLog:

* gimple-array-bounds.h (class array_bounds_checker): Change
ranges type to range_query.
---
 gcc/gimple-array-bounds.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h
index 1bfa2d45870..fa64262777d 100644
--- a/gcc/gimple-array-bounds.h
+++ b/gcc/gimple-array-bounds.h
@@ -25,7 +25,7 @@ class array_bounds_checker
   friend class check_array_bounds_dom_walker;
 
 public:
-  array_bounds_checker (struct function *fun, class vr_values *v)
+  array_bounds_checker (struct function *fun, range_query *v)
 : fun (fun), ranges (v) { }
   void check ();
 
@@ -37,7 +37,7 @@ private:
   const value_range *get_value_range (const_tree op);
 
   struct function *fun;
-  class vr_values *ranges;
+  range_query *ranges;
 };
 
 #endif // GCC_GIMPLE_ARRAY_BOUNDS_H
-- 
2.31.1



Re: [PATCH v3] Use range-based for loops for traversing loops

2021-07-23 Thread Martin Sebor via Gcc-patches

On 7/23/21 2:35 AM, Kewen.Lin wrote:

Hi,

Comparing to v2, this v3 removed the new CTOR with struct loops *loops
as Richi clarified.  I'd like to support it in a separated follow up
patch by extending the existing CTOR with an optional argument loop_p
root.


Looks very nice (and quite a bit work)!  Thanks again!

Not to make even more work for you, but it occurred to me that
the declaration of the loop control variable could be simplified
by the use of auto like so:

  for (auto loop: loops_list (cfun, ...))

I spotted what looks to me like a few minor typos in the docs
diff:

diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi
index a135656ed01..27697b08728 100644
--- a/gcc/doc/loop.texi
+++ b/gcc/doc/loop.texi
@@ -79,14 +79,14 @@ and its subloops in the numbering.  The index of a 
loop never changes.


 The entries of the @code{larray} field should not be accessed directly.
 The function @code{get_loop} returns the loop description for a loop with
-the given index.  @code{number_of_loops} function returns number of
-loops in the function.  To traverse all loops, use @code{FOR_EACH_LOOP}
-macro.  The @code{flags} argument of the macro is used to determine
-the direction of traversal and the set of loops visited.  Each loop is
-guaranteed to be visited exactly once, regardless of the changes to the
-loop tree, and the loops may be removed during the traversal.  The newly
-created loops are never traversed, if they need to be visited, this
-must be done separately after their creation.
+the given index.  @code{number_of_loops} function returns number of loops
+in the function.  To traverse all loops, use range-based for loop with

Missing article:

   use a range-based for loop

+class @code{loop_list} instance. The @code{flags} argument of the macro

Is that loop_list or loops_list?

IIUC, it's also not a macro anymore, right?  The flags argument
is passed to the loop_list ctor, no?

Martin



Bootstrapped and regtested again on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, also
bootstrapped again on ppc64le P9 with bootstrap-O3 config.

Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* cfgloop.h (as_const): New function.
(class loop_iterator): Rename to ...
(class loops_list): ... this.
(loop_iterator::next): Rename to ...
(loops_list::Iter::fill_curr_loop): ... this and adjust.
(loop_iterator::loop_iterator): Rename to ...
(loops_list::loops_list): ... this and adjust.
(loops_list::Iter): New class.
(loops_list::iterator): New type.
(loops_list::const_iterator): New type.
(loops_list::begin): New function.
(loops_list::end): Likewise.
(loops_list::begin const): Likewise.
(loops_list::end const): Likewise.
(FOR_EACH_LOOP): Remove.
(FOR_EACH_LOOP_FN): Remove.
* cfgloop.c (flow_loops_dump): Adjust FOR_EACH_LOOP* with range-based
for loop with loops_list instance.
(sort_sibling_loops): Likewise.
(disambiguate_loops_with_multiple_latches): Likewise.
(verify_loop_structure): Likewise.
* cfgloopmanip.c (create_preheaders): Likewise.
(force_single_succ_latches): Likewise.
* config/aarch64/falkor-tag-collision-avoidance.c
(execute_tag_collision_avoidance): Likewise.
* config/mn10300/mn10300.c (mn10300_scan_for_setlb_lcc): Likewise.
* config/s390/s390.c (s390_adjust_loops): Likewise.
* doc/loop.texi: Likewise.
* gimple-loop-interchange.cc (pass_linterchange::execute): Likewise.
* gimple-loop-jam.c (tree_loop_unroll_and_jam): Likewise.
* gimple-loop-versioning.cc (loop_versioning::analyze_blocks): Likewise.
(loop_versioning::make_versioning_decisions): Likewise.
* gimple-ssa-split-paths.c (split_paths): Likewise.
* graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
* graphite.c (canonicalize_loop_form): Likewise.
(graphite_transform_loops): Likewise.
* ipa-fnsummary.c (analyze_function_body): Likewise.
* ipa-pure-const.c (analyze_function): Likewise.
* loop-doloop.c (doloop_optimize_loops): Likewise.
* loop-init.c (loop_optimizer_finalize): Likewise.
(fix_loop_structure): Likewise.
* loop-invariant.c (calculate_loop_reg_pressure): Likewise.
(move_loop_invariants): Likewise.
* loop-unroll.c (decide_unrolling): Likewise.
(unroll_loops): Likewise.
* modulo-sched.c (sms_schedule): Likewise.
* predict.c (predict_loops): Likewise.
(pass_profile::execute): Likewise.
* profile.c (branch_prob): Likewise.
* sel-sched-ir.c (sel_finish_pipelining): Likewise.
(sel_find_rgns): Likewise.
* tree-cfg.c (replace_loop_annotate): Likewise.
(replace_uses_by): Likewise.
(move_sese_region_to_fn): Likewise.
* tree-if-conv.c (pass_if_conversion::execute): Like

Re: [PATCH] expmed: Fix store_integral_bit_field [PR101562]

2021-07-23 Thread Jeff Law via Gcc-patches




On 7/23/2021 1:49 AM, Jakub Jelinek wrote:

Hi!

Our documentation says that paradoxical subregs shouldn't appear
in strict_low_part:
'(strict_low_part (subreg:M (reg:N R) 0))'
  This expression code is used in only one context: as the
  destination operand of a 'set' expression.  In addition, the
  operand of this expression must be a non-paradoxical 'subreg'
  expression.
but on the testcase below that triggers UB at runtime
store_integral_bit_field emits exactly that.

The following patch fixes it by ensuring the requirement is satisfied.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-23  Jakub Jelinek  

PR rtl-optimization/101562
* expmed.c (store_integral_bit_field): Only use movstrict_optab
if the operand isn't paradoxical.

* gcc.c-torture/compile/pr101562.c: New test.

OK
jeff



Re: [PATCH] Make loops_list support an optional loop_p root

2021-07-23 Thread Martin Sebor via Gcc-patches

On 7/23/21 2:41 AM, Kewen.Lin wrote:

on 2021/7/22 下午8:56, Richard Biener wrote:

On Tue, Jul 20, 2021 at 4:37
PM Kewen.Lin  wrote:


Hi,

This v2 has addressed some review comments/suggestions:

   - Use "!=" instead of "<" in function operator!= (const Iter &rhs)
   - Add new CTOR loops_list (struct loops *loops, unsigned flags)
 to support loop hierarchy tree rather than just a function,
 and adjust to use loops* accordingly.


I actually meant struct loop *, not struct loops * ;)  At the point
we pondered to make loop invariant motion work on single
loop nests we gave up not only but also because it iterates
over the loop nest but all the iterators only ever can process
all loops, not say, all loops inside a specific 'loop' (and
including that 'loop' if LI_INCLUDE_ROOT).  So the
CTOR would take the 'root' of the loop tree as argument.

I see that doesn't trivially fit how loops_list works, at least
not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
could be adjusted to do ONLY_INNERMOST as well?




Thanks for the clarification!  I just realized that the previous
version with struct loops* is problematic, all traversal is
still bounded with outer_loop == NULL.  I think what you expect
is to respect the given loop_p root boundary.  Since we just
record the loops' nums, I think we still need the function* fn?
So I add one optional argument loop_p root and update the
visiting codes accordingly.  Before this change, the previous
visiting uses the outer_loop == NULL as the termination condition,
it perfectly includes the root itself, but with this given root,
we have to use it as the termination condition to avoid to iterate
onto its possible existing next.

For LI_ONLY_INNERMOST, I was thinking whether we can use the
code like:

 struct loops *fn_loops = loops_for_fn (fn)->larray;
 for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++)
 if (aloop != NULL
 && aloop->inner == NULL
 && flow_loop_nested_p (tree_root, aloop))
  this->to_visit.quick_push (aloop->num);

it has the stable bound, but if the given root only has several
child loops, it can be much worse if there are many loops in fn.
It seems impossible to predict the given root loop hierarchy size,
maybe we can still use the original linear searching for the case
loops_for_fn (fn) == root?  But since this visiting seems not so
performance critical, I chose to share the code originally used
for FROM_INNERMOST, hope it can have better readability and
maintainability.


I might be mixing up the two patches (they both seem to touch
the same functions), but in this one the loops_list ctor looks
like a sizeable function with at least one loop.  Since the ctor
is used in the initialization of each of the many range-for loops,
that could result in inlining of a lot of these calls and so quite
a bit code bloat.  Unless this is necessary for efficiency  (not
my area) I would recommend to consider defining the loops_list
ctor out-of-line in some .c or .cc file.

(Also, if you agree with the rationale, I'd replace loop_p with
loop * in the new code.)

Thanks
Martin



Bootstrapped and regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, also
bootstrapped on ppc64le P9 with bootstrap-O3 config.

Does the attached patch meet what you expect?

BR,
Kewen
-
gcc/ChangeLog:

* cfgloop.h (loops_list::loops_list): Add one optional argument root
and adjust accordingly.





Re: 0001-Don-t-skip-prologue-instructions-as-it-could-affect-.patch

2021-07-23 Thread Jeff Law via Gcc-patches




On 7/22/2021 7:04 AM, Richard Biener via Gcc-patches wrote:

On Thu, Jul 22, 2021 at 9:02 AM Bin.Cheng via Gcc-patches
 wrote:

Gentle ping.  Any suggestions would be appreciated.

So just to say something - does the existing code mean that any use of
the alias info on prologue/epilogue insns is wrong?  We have

   /* The prologue/epilogue insns are not threaded onto the
  insn chain until after reload has completed.  Thus,
  there is no sense wasting time checking if INSN is in
  the prologue/epilogue until after reload has completed.  */
   bool could_be_prologue_epilogue = ((targetm.have_prologue ()
   || targetm.have_epilogue ())
  && reload_completed);

so when !could_be_prologue_epilogue then passes shouldn't run into
them if the comment is correct.  But else even epilogue stmts could appear
anywhere (like scheduled around)?  So why's skipping those OK?
These insns don't exist until after reload has completed.  I think this 
code is just trying to be more compile-time efficient and not look for 
them when they're known to not exist.


As for why they're skipped?  That seems wrong to me.  That was added by 
Kenner:


https://gcc.gnu.org/pipermail/gcc-patches/2000-May/031560.html

Interestingly enough myself (and others) preserved that behavior through 
several updates to this code.




Are passes supposed to check whether they are dealing with pro/epilogue
insns and not touch them?  CCing people that might know.

Generally most passes can treat them as any other RTL.

Jeff



Re: 0001-Don-t-skip-prologue-instructions-as-it-could-affect-.patch

2021-07-23 Thread Jeff Law via Gcc-patches




On 7/14/2021 3:14 AM, bin.cheng via Gcc-patches wrote:

Hi,
I ran into a wrong code bug in code with deep template instantiation when 
working on sdx::simd.
The root cause as described in commit summary is we skip prologue insns in 
init_alias_analysis.
This simple patch fixes the issue, however, it's hard to reduce a case because 
of heavy use of
templates.
Bootstrap and test on x86_64, is it OK?
It's a clear correctness improvement, but what's unclear to me is why 
we'd want to skip them in the epilogue either.


Jeff


Re: [PATCH] c++: suppress all warnings on memper pointers to work around dICE [PR101219]

2021-07-23 Thread Jeff Law via Gcc-patches




On 7/22/2021 5:15 PM, Sergei Trofimovich via Gcc-patches wrote:

From: Sergei Trofimovich 

r12-1804 ("cp: add support for per-location warning groups.") among other
things removed warning suppression from a few places including ptrmemfuncs.

Currently ptrmemfuncs don't have valid BINFO attached which causes ICEs
in access checks:

 crash_signal
 gcc/toplev.c:328
 perform_or_defer_access_check(tree_node*, tree_node*, tree_node*, int, 
access_failure_info*)
 gcc/cp/semantics.c:490
 finish_non_static_data_member(tree_node*, tree_node*, tree_node*)
 gcc/cp/semantics.c:2208
 ...

The change suppresses warnings again until we provide BINFOs for ptrmemfuncs.

PR c++/101219

gcc/cp/ChangeLog:

* typeck.c (build_ptrmemfunc_access_expr): Suppress all warnings
to avoid ICE.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr101219.C: New test.
The C++ maintainers have the final say here, but ISTM that warning 
suppression shouldn't be used to avoid an ICE, even an ICE within the 
warning or diagnostic code itself.


jeff



[PATCH] libstdc++: Fix up implementation of LWG 3533 [PR101589]

2021-07-23 Thread Patrick Palka via Gcc-patches
In r12-569 I accidentally applied the LWG 3533 change that
was intended for elements_view::iterator::base to
elements_view::base instead.

This patch corrects this, and also applies the corresponding LWG 3533
change to lazy_split_view::inner-iter::base now that we implement P2210.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk and release
branches?

PR libstdc++/101589

libstdc++-v3/ChangeLog:

* include/std/ranges (lazy_split_view::_InnerIter::base): Make
the const& overload unconstrained and return a const reference
as per LWG 3533.  Make unconditionally noexcept.
(elements_view::base): Revert accidental r12-569 change.
(elements_view::_Iterator::base): Make the const& overload
unconstrained and return a const reference as per LWG 3533.
Make unconditionally noexcept.
---
 libstdc++-v3/include/std/ranges | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index d791e15d096..50b414e8c8c 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3103,8 +3103,8 @@ namespace views::__adaptor
: _M_i(std::move(__i))
  { }
 
- constexpr iterator_t<_Base>
- base() const& requires copyable>
+ constexpr const iterator_t<_Base>&
+ base() const& noexcept
  { return _M_i_current(); }
 
  constexpr iterator_t<_Base>
@@ -3786,8 +3786,8 @@ namespace views::__adaptor
: _M_base(std::move(base))
   { }
 
-  constexpr const _Vp&
-  base() const & noexcept
+  constexpr _Vp
+  base() const& requires copy_constructible<_Vp>
   { return _M_base; }
 
   constexpr _Vp
@@ -3913,9 +3913,8 @@ namespace views::__adaptor
: _M_current(std::move(i._M_current))
  { }
 
- constexpr iterator_t<_Base>
- base() const&
-   requires copyable>
+ constexpr const iterator_t<_Base>&
+ base() const& noexcept
  { return _M_current; }
 
  constexpr iterator_t<_Base>
-- 
2.32.0.349.gdaab8a564f



[PATCH] libstdc++: Add missing std::move to join_view::iterator ctor [PR101483]

2021-07-23 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk/branches?

PR libstdc++/101483

libstdc++-v3/ChangeLog:

* include/std/ranges (join_view::_Iterator::_Iterator): Add
missing std::move.
---
 libstdc++-v3/include/std/ranges | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 50b414e8c8c..5bdcd445a9e 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -2588,7 +2588,7 @@ namespace views::__adaptor
requires _Const
  && convertible_to, _Outer_iter>
  && convertible_to, _Inner_iter>
-   : _M_outer(std::move(__i._M_outer)), _M_inner(__i._M_inner),
+   : _M_outer(std::move(__i._M_outer)), 
_M_inner(std::move(__i._M_inner)),
  _M_parent(__i._M_parent)
  { }
 
-- 
2.32.0.349.gdaab8a564f



[PATCH] libstdc++: Fix up implementation of LWG 3533 [PR101589]

2021-07-23 Thread Patrick Palka via Gcc-patches
In r12-569 I accidentally applied the LWG 3533 change that
was intended for elements_view::iterator::base to
elements_view::base instead.

This patch corrects this, and also applies the corresponding LWG 3533
change to lazy_split_view::inner-iter::base now that we implement P2210.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk and release
branches?

PR libstdc++/101589

libstdc++-v3/ChangeLog:

* include/std/ranges (lazy_split_view::_InnerIter::base): Make
the const& overload unconstrained and return a const reference
as per LWG 3533.  Make unconditionally noexcept.
(elements_view::base): Revert accidental r12-569 change.
(elements_view::_Iterator::base): Make the const& overload
unconstrained and return a const reference as per LWG 3533.
Make unconditionally noexcept.
---
 libstdc++-v3/include/std/ranges | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index d791e15d096..50b414e8c8c 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3103,8 +3103,8 @@ namespace views::__adaptor
: _M_i(std::move(__i))
  { }
 
- constexpr iterator_t<_Base>
- base() const& requires copyable>
+ constexpr const iterator_t<_Base>&
+ base() const& noexcept
  { return _M_i_current(); }
 
  constexpr iterator_t<_Base>
@@ -3786,8 +3786,8 @@ namespace views::__adaptor
: _M_base(std::move(base))
   { }
 
-  constexpr const _Vp&
-  base() const & noexcept
+  constexpr _Vp
+  base() const& requires copy_constructible<_Vp>
   { return _M_base; }
 
   constexpr _Vp
@@ -3913,9 +3913,8 @@ namespace views::__adaptor
: _M_current(std::move(i._M_current))
  { }
 
- constexpr iterator_t<_Base>
- base() const&
-   requires copyable>
+ constexpr const iterator_t<_Base>&
+ base() const& noexcept
  { return _M_current; }
 
  constexpr iterator_t<_Base>
-- 
2.32.0.349.gdaab8a564f



[PATCH] libstdc++: Add missing std::move in ranges::copy/move/reverse_copy [PR101599]

2021-07-23 Thread Patrick Palka via Gcc-patches
In passing, this also renames the template parameter _O2 to _Out2 in
ranges::partition_copy and uglify its function parameters out_true
and out_false.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk+branches?

PR libstdc++/101599

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__reverse_copy_fn::operator()):
Add missing std::move in return statement.
(__partition_copy_fn::operator()): Rename templtae parameter
_O2 to _Out2.  Uglify function parameters out_true and out_false.
* include/bits/ranges_algobase.h (__copy_or_move): Add missing
std::move to recursive call that unwraps a __normal_iterator
output iterator.
* testsuite/25_algorithms/copy/constrained.cc (test06): New test.
* testsuite/25_algorithms/move/constrained.cc (test05): New test.
---
 libstdc++-v3/include/bits/ranges_algo.h   | 20 +--
 libstdc++-v3/include/bits/ranges_algobase.h   |  2 +-
 .../25_algorithms/copy/constrained.cc | 13 
 .../25_algorithms/move/constrained.cc | 13 
 4 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 83371a4bdf0..8462521c369 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -1343,7 +1343,7 @@ namespace ranges
*__result = *__tail;
++__result;
  }
-   return {__i, __result};
+   return {__i, std::move(__result)};
   }
 
 template
@@ -2423,14 +2423,14 @@ namespace ranges
   struct __partition_copy_fn
   {
 template _Sent,
-weakly_incrementable _Out1, weakly_incrementable _O2,
+weakly_incrementable _Out1, weakly_incrementable _Out2,
 typename _Proj = identity,
 indirect_unary_predicate> _Pred>
   requires indirectly_copyable<_Iter, _Out1>
-   && indirectly_copyable<_Iter, _O2>
-  constexpr partition_copy_result<_Iter, _Out1, _O2>
+   && indirectly_copyable<_Iter, _Out2>
+  constexpr partition_copy_result<_Iter, _Out1, _Out2>
   operator()(_Iter __first, _Sent __last,
-_Out1 __out_true, _O2 __out_false,
+_Out1 __out_true, _Out2 __out_false,
 _Pred __pred, _Proj __proj = {}) const
   {
for (; __first != __last; ++__first)
@@ -2450,18 +2450,18 @@ namespace ranges
   }
 
 template, _Proj>>
   _Pred>
   requires indirectly_copyable, _Out1>
-   && indirectly_copyable, _O2>
-  constexpr partition_copy_result, _Out1, _O2>
-  operator()(_Range&& __r, _Out1 out_true, _O2 out_false,
+   && indirectly_copyable, _Out2>
+  constexpr partition_copy_result, _Out1, 
_Out2>
+  operator()(_Range&& __r, _Out1 __out_true, _Out2 __out_false,
 _Pred __pred, _Proj __proj = {}) const
   {
return (*this)(ranges::begin(__r), ranges::end(__r),
-  std::move(out_true), std::move(out_false),
+  std::move(__out_true), std::move(__out_false),
   std::move(__pred), std::move(__proj));
   }
   };
diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index c1037657c4c..78c295981d5 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -244,7 +244,7 @@ namespace ranges
   else if constexpr (__is_normal_iterator<_Out>)
{
  auto [__in,__out]
-   = ranges::__copy_or_move<_IsMove>(__first, __last, __result.base());
+   = ranges::__copy_or_move<_IsMove>(std::move(__first), __last, 
__result.base());
  return {std::move(__in), decltype(__result){__out}};
}
   else if constexpr (sized_sentinel_for<_Sent, _Iter>)
diff --git a/libstdc++-v3/testsuite/25_algorithms/copy/constrained.cc 
b/libstdc++-v3/testsuite/25_algorithms/copy/constrained.cc
index 77ecf99d5b1..a05948a49c6 100644
--- a/libstdc++-v3/testsuite/25_algorithms/copy/constrained.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/copy/constrained.cc
@@ -26,6 +26,7 @@
 using __gnu_test::test_container;
 using __gnu_test::test_range;
 using __gnu_test::input_iterator_wrapper;
+using __gnu_test::input_iterator_wrapper_nocopy;
 using __gnu_test::output_iterator_wrapper;
 using __gnu_test::forward_iterator_wrapper;
 
@@ -214,6 +215,17 @@ test05()
   return ok;
 }
 
+void
+test06()
+{
+  // PR libstdc++/101599
+  int x[] = {1,2,3};
+  test_range rx(x);
+  std::vector v(4, 0);
+  ranges::copy(rx, v.begin());
+  VERIFY( ranges::equal(v, (int[]){1,2,3,0}) );
+}
+
 int
 main()
 {
@@ -222,4 +234,5 @@ main()
   static_assert(test03());
   test04();
   static_assert(test05());
+  test06();
 }
diff --git a/libstdc++-v3/testsuite/25_algorithms/move/constrained.cc 
b/libstdc++-v3/testsuite/25_algorithms/move/constrained.cc
index 1cdfbdf23bc..2

Re: [PATCH] libstdc++: Add missing std::move to join_view::iterator ctor [PR101483]

2021-07-23 Thread Jonathan Wakely via Gcc-patches
On Fri, 23 Jul 2021 at 17:36, Patrick Palka via Libstdc++
 wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/branches?

Yes, thanks.


>
> PR libstdc++/101483
>
> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (join_view::_Iterator::_Iterator): Add
> missing std::move.
> ---
>  libstdc++-v3/include/std/ranges | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index 50b414e8c8c..5bdcd445a9e 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -2588,7 +2588,7 @@ namespace views::__adaptor
> requires _Const
>   && convertible_to, _Outer_iter>
>   && convertible_to, _Inner_iter>
> -   : _M_outer(std::move(__i._M_outer)), _M_inner(__i._M_inner),
> +   : _M_outer(std::move(__i._M_outer)), 
> _M_inner(std::move(__i._M_inner)),
>   _M_parent(__i._M_parent)
>   { }
>
> --
> 2.32.0.349.gdaab8a564f
>



Re: [PATCH] correct uninitialized object offset and size computation [PR101494]

2021-07-23 Thread Jeff Law via Gcc-patches




On 7/22/2021 3:58 PM, Martin Sebor via Gcc-patches wrote:

The code that computes the size of an access to an object in
-Wuninitialized is limited to declared objects and so doesn't
apply to allocated objects, and doesn't correctly account for
an offset into the object and the access size.  This causes
false positives.

The attached fix tested on x86_64-linux corrects this.

Martin

gcc-101494.diff

Correct uninitialized object offset and size computation [PR101494].

Resolves:
PR middle-end/101494 - -uninitialized false alarm with memrchr of size 0

gcc/ChangeLog:

PR middle-end/101494
* tree-ssa-uninit.c (builtin_call_nomodifying_p):
(check_defs):
(maybe_warn_operand):

gcc/testsuite/ChangeLog:

PR middle-end/101494
* gcc.dg/uninit-38.c:
* gcc.dg/uninit-41.c: New test.
* gcc.dg/uninit-pr101494.c: New test.
OK once you complete the ChangeLog entry for the tree-ssa-uninit.c 
change.  Note this change only modifies maybe_warn_operand.


jeff



PING [PATCH] add access warning pass

2021-07-23 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575377.html

On 7/15/21 4:39 PM, Martin Sebor wrote:

A number of access warnings as well as their supporting
infrastructure (compute_objsize et al.) are implemented in
builtins.{c,h} where they  (mostly) operate on trees and run
just before RTL expansion.

This setup may have made sense initially when the warnings were
very simple and didn't perform any CFG analysis, but it's becoming
a liability.  The code has grown both in size and in complexity,
might need to examine the CFG to improve detection, and in some
cases might achieve a better S/R ratio if run earlier.  Running
the warning code on trees is also slower because it doesn't
benefit from the SSA_NAME caching provided by the pointer_query
class.  Finally, having the code there is also an impediment to
maintainability as warnings and builtin expansion are unrelated
to each other and contributors to one area shouldn't need to wade
through unrelated code (similar for patch reviewers).

The attached change introduces a new warning pass and a couple of
new source and headers and, as the first step, moves the warning
code from builtins.{c,h} there.  To keep the initial changes as
simple as possible the pass only runs a subset of existing
warnings: -Wfree-nonheap-object, -Wmismatched-dealloc, and
-Wmismatched-new-delete.  The others (-Wstringop-overflow and
-Wstringop-overread) still run on the tree representation and
are still invoked from builtins.c or elsewhere.

The changes have no functional impact either on codegen or on
warnings.  I tested them on x86_64-linux.

As the next step I plan to change the -Wstringop-overflow and
-Wstringop-overread code to run on the GIMPLE IL in the new pass
instead of on trees in builtins.c.

Martin

PS The builtins.c diff produced by git diff was much bigger than
the changes justify.  It seems that the code removal somehow
confused it.  To make review easy I replaced it with a plain
unified diff of builtins.c that doesn't suffer from the problem.




Re: [PATCH] tree-optimization/101573 - improve uninit warning at -O0

2021-07-23 Thread Jeff Law via Gcc-patches




On 7/22/2021 6:34 AM, Richard Biener wrote:

We can improve uninit warnings from the early pass by looking
at PHI arguments on fallthru edges that are uninitialized and
have uses that are before a possible loop exit.  This catches
some cases earlier that we'd only warn in a more confusing
way after early inlining as seen by testcase adjustments.

It introduces

FAIL: gcc.dg/uninit-23.c (test for excess errors)

where we additionally warn

gcc.dg/uninit-23.c:21:13: warning: 't4' is used uninitialized [-Wuninitialized]

which I think is OK even if it's not obvious that the new
warning is an improvement when you look at the obvious source.

Somehow for all cases I never get the `'foo' was declared here`
notes, I didn't dig why that happens but it's odd.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

2021-07-22  Richard Biener  

PR tree-optimization/101573
* tree-ssa-uninit.c (warn_uninitialized_vars): Look at
uninitialized PHI arg defs in some constrained cases.
(execute_early_warn_uninitialized): Calculate dominators.

* gcc.dg/uninit-pr101573.c: New testcase.
* gcc.dg/uninit-15-O0.c: Adjust.
* gcc.dg/uninit-15.c: Likewise.
* gcc.dg/uninit-23.c: Likewise.
* c-c++-common/uninit-17.c: Likewise.
OK.  Like Martin I think the new code in a function would be easier to 
read.  So if you could factor the bits into a new function it'd be 
appreciated.


I wouldn't be terribly surprised if we find additional fallout on other 
targets, but we can fault in those fixes.


Jeff




Re: [PATCH] include: Fix -Wundef warnings in ansidecl.h

2021-07-23 Thread Jeff Law via Gcc-patches




On 7/20/2021 4:01 PM, Marek Polacek via Gcc-patches wrote:

This quashes -Wundef warnings in ansidecl.h when compiled in C or C++.
In C, __cpp_constexpr and __cplusplus aren't defined so we evaluate
them to 0; conversely, __STDC_VERSION__ is not defined in C++.
This has caused grief when -Wundef is used with -Werror.

I've also tested -traditional-cpp.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

include/ChangeLog:

* ansidecl.h: Check if __cplusplus is defined before checking
the value of __cpp_constexpr and __cplusplus.  Don't check
__STDC_VERSION__ in C++.

OK
jeff



Re: [PATCH V3] Use preferred mode for doloop IV [PR61837]

2021-07-23 Thread Jeff Law via Gcc-patches




On 7/15/2021 4:08 AM, Jiufu Guo via Gcc-patches wrote:

Refine code for V2 according to review comments:
* Use if check instead assert, and refine assert
* Use better RE check for test case, e.g. (?n)/(?p)
* Use better wording for target.def

Currently, doloop.xx variable is using the type as niter which may be
shorter than word size.  For some targets, it would be better to use
word size type.  For example, on 64bit system, to access 32bit value,
subreg maybe used.  Then using 64bit type maybe better for niter if
it can be present in both 32bit and 64bit.

This patch add target hook for querg perferred mode for doloop IV.
And update mode accordingly.

Bootstrap and regtest pass on powerpc64le, is this ok for trunk?

BR.
Jiufu

gcc/ChangeLog:

2021-07-15  Jiufu Guo  

PR target/61837
* config/rs6000/rs6000.c (TARGET_PREFERRED_DOLOOP_MODE): New hook.
(rs6000_preferred_doloop_mode): New hook.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook preferred_doloop_mode.
* target.def (preferred_doloop_mode): New hook.
* targhooks.c (default_preferred_doloop_mode): New hook.
* targhooks.h (default_preferred_doloop_mode): New hook.
* tree-ssa-loop-ivopts.c (compute_doloop_base_on_mode): New function.
(add_iv_candidate_for_doloop): Call targetm.preferred_doloop_mode
and compute_doloop_base_on_mode.

gcc/testsuite/ChangeLog:

2021-07-15  Jiufu Guo  

PR target/61837
* gcc.target/powerpc/pr61837.c: New test.
My first reaction was that whatever type corresponds to the target's 
word_mode would be the right choice.  But then I remembered things like 
dbCC on m68k which had a more limited range.  While I don't think m68k 
uses the doloop bits, it's a clear example that the most desirable type 
may not correspond to the word type for the target.


So my concern with this patch is its introducing more target 
dependencies into the gimple pipeline which is generally considered 
undesirable from a design standpoint.  Is there any way to lower from 
whatever type is chosen by ivopts to the target's desired type at the 
gimple->rtl border rather than doing it in ivopts?


jeff



Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

2021-07-23 Thread Segher Boessenkool
Hi!

On Fri, Jul 23, 2021 at 07:47:54AM +0200, Martin Liška wrote:
> On 7/12/21 7:20 PM, Segher Boessenkool wrote:
> +static __attribute__ ((optimize ("-fno-stack-protector"))) __typeof 
> (f) *
> >>>
> >>>-fno-stack-protector is default.
> >>
> >>Yes, but one needs an optimize attribute in order to trigger
> >>cl_target_option_save/restore
> >>mechanism.
> >
> >So it behaves differently if you select the default than if you do not
> >select anything?  That is wrong, no?
> 
> Sorry, I don't get your example, please explain it.

If -mbork is the default, the coompiler whould behave the same if you
invoke it with -mbork as when you do not.  And the optimize attribute
should work exactly the same as command line options.

Or perhaps you are saying you have this in the testcase only to exercise
the option save/restore code paths?  Please document that then, in the
testcase.


Segher


Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size

2021-07-23 Thread Segher Boessenkool
Hi!

On Tue, Jun 01, 2021 at 03:39:14PM +0200, Martin Liska wrote:
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): When
>   a target option is restored, it can have
>   rs6000_long_double_type_size set to FLOAT_PRECISION_TFmode
>   and error should not be emitted.

> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4186,6 +4186,8 @@ rs6000_option_override_internal (bool global_init_p)
>else
>   rs6000_long_double_type_size = default_long_double_size;
>  }
> +  else if (rs6000_long_double_type_size == FLOAT_PRECISION_TFmode)
> +; /* The option value can be seen when cl_target_option_restore is 
> called.  */

(line too long)

The comment is still very cryptic.  What you have in the changelog is
much better digestible.

Okay for trunk with those things fixed.  Thanks!


Segher


[PATCH] Do not use tuple-like interface for pair in unordered containers

2021-07-23 Thread Jonathan Wakely via Gcc-patches

I've been experimenting with this patch, which removes the need to use
std::tuple_element and std::get to access the members of a std::pair
in unordered_{map,multimap}.

I'm in the process of refactoring the  header to reduce
header dependencies throughout the library, and this is the only use
of the tuple-like interface for std::pair in the library.

Using tuple_element and std::get resolved PR 53339 by allowing the
std::pair type to be incomplete, however that is no longer supported
anyway (the 23_containers/unordered_map/requirements/53339.cc test
case is XFAILed). That means we could just define _Select1st as:

  struct _Select1st
  {
template
  auto
  operator()(_Tp&& __x) const noexcept
  -> decltype(std::forward<_Tp>(__x).first)
  { return std::forward<_Tp>(__x).first; }
  };

But the approach in the patch seems OK too.

We don't need _Select2nd because it's only needed in
_NodeBuilder::_S_build, and that can just access the .second member of
the pair directly. The return type of that function doesn't need to be
deduced by decltype, we can just expose the __node_type typedef of the
node generator, or we could add this to the node generators:

  using result_type = __node_type*;

None of these changes are essential, but they allows other headers in
the library to be kept smaller, so they compile faster, and only
declare the components that are actually require by the standard.

What do you think?


diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 2130c958262..993c006f900 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -87,20 +87,25 @@ namespace __detail
 
   struct _Select1st
   {
-template
-  auto
-  operator()(_Tp&& __x) const noexcept
-  -> decltype(std::get<0>(std::forward<_Tp>(__x)))
-  { return std::get<0>(std::forward<_Tp>(__x)); }
-  };
+template
+  struct __1st_type;
+
+template
+  struct __1st_type>
+  { using type = _Tp; };
+
+template
+  struct __1st_type>
+  { using type = const _Tp; };
+
+template
+  struct __1st_type<_Pair&>
+  { using type = typename __1st_type<_Pair>::type&; };
 
-  struct _Select2nd
-  {
 template
-  auto
+  typename __1st_type<_Tp>::type&&
   operator()(_Tp&& __x) const noexcept
-  -> decltype(std::get<1>(std::forward<_Tp>(__x)))
-  { return std::get<1>(std::forward<_Tp>(__x)); }
+  { return std::forward<_Tp>(__x).first; }
   };
 
   template
@@ -112,14 +117,10 @@ namespace __detail
   template
 	static auto
 	_S_build(_Kt&& __k, _Arg&& __arg, const _NodeGenerator& __node_gen)
-	-> decltype(__node_gen(std::piecewise_construct,
-			   std::forward_as_tuple(std::forward<_Kt>(__k)),
-			   std::forward_as_tuple(_Select2nd{}(
-		std::forward<_Arg>(__arg)
+	-> typename _NodeGenerator::__node_type*
 	{
-	  return __node_gen(std::piecewise_construct,
-	std::forward_as_tuple(std::forward<_Kt>(__k)),
-	std::forward_as_tuple(_Select2nd{}(std::forward<_Arg>(__arg;
+	  return __node_gen(std::forward<_Kt>(__k),
+			std::forward<_Arg>(__arg).second);
 	}
 };
 
@@ -129,7 +130,7 @@ namespace __detail
   template
 	static auto
 	_S_build(_Kt&& __k, _Arg&&, const _NodeGenerator& __node_gen)
-	-> decltype(__node_gen(std::forward<_Kt>(__k)))
+	-> typename _NodeGenerator::__node_type*
 	{ return __node_gen(std::forward<_Kt>(__k)); }
 };
 
@@ -146,9 +147,10 @@ namespace __detail
   using __hashtable_alloc = _Hashtable_alloc<__node_alloc_type>;
   using __node_alloc_traits =
 	typename __hashtable_alloc::__node_alloc_traits;
-  using __node_type = typename __hashtable_alloc::__node_type;
 
 public:
+  using __node_type = typename __hashtable_alloc::__node_type;
+
   _ReuseOrAllocNode(__node_type* __nodes, __hashtable_alloc& __h)
   : _M_nodes(__nodes), _M_h(__h) { }
   _ReuseOrAllocNode(const _ReuseOrAllocNode&) = delete;
@@ -194,9 +196,10 @@ namespace __detail
 {
 private:
   using __hashtable_alloc = _Hashtable_alloc<_NodeAlloc>;
-  using __node_type = typename __hashtable_alloc::__node_type;
 
 public:
+  using __node_type = typename __hashtable_alloc::__node_type;
+
   _AllocNode(__hashtable_alloc& __h)
   : _M_h(__h) { }
 
@@ -667,8 +670,8 @@ namespace __detail
   /**
*  Primary class template _Map_base.
*
-   *  If the hashtable has a value type of the form pair and a
-   *  key extraction policy (_ExtractKey) that returns the first part
+   *  If the hashtable has a value type of the form pair and
+   *  a key extraction policy (_ExtractKey) that returns the first part
*  of the pair, the hashtable gets a mapped_type typedef.  If it
*  satisfies those criteria and also has unique keys, then it also
*  gets an operator[].
@@ -680,37 +683,38 @@ namespace __detail
 	   bool _Unique_keys = _Traits::__unique_keys::value>

Re: [PATCH] PR fortran/101536 - ICE in gfc_conv_expr_descriptor, at fortran/trans-array.c:7324

2021-07-23 Thread Harald Anlauf via Gcc-patches
Hi Tobias,

> > However, an additional plain check on e->rank != 0 also in the
> > CLASS cases fixes the original issue as well as your example:
> [...]
> > And regtests ok. :-)
> > See attached updated patch.
> 
> I think you still need to remove the 'return true;' from
> the 'if (e->rank != 0 && e->ts.type == BT_CLASS' block – to
> fall through to the e->rank check after the block.
> (When 'return true;' is gone, the '{' and '}' can also be removed.)
> 
> Reason: Assume 'CLASS(...) x'. In this case, 'x' is a scalar.
> And even after calling gfc_add_class_array_ref it remains
> a scalar and e->rank == 0.
> 
> Or in other words: I think with your current patch,
>  class(u)  :: z
>  f = size (z)
> is wrongly accepted without an error.

did you really check that?  My related testing succeeded without
and with the return (which was in the original commit by Paul).

I have nevertheless followed your advice to remove the return
statement, extended the testcase and regtested again.

Committed as https://gcc.gnu.org/g:e314cfc371d8b2405a1d81e51b90f9fb24b9061f

Thanks,
Harald



PING: [RS6000] rotate and mask constants [PR94393]

2021-07-23 Thread Pat Haugen via Gcc-patches
Ping https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html

I've done a current bootstrap/regtest on powerpc64/powerpc64le with no 
regressions.

-Pat


[PATCH] Fix x86/56337 : 1<<28 alignment is broken

2021-07-23 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The problem here is the x86_64 back-end uses a signed integer
for alignment and then divides by BITS_PER_UNIT so if we had
INT_MIN (which is what 1<<28*8 is), we would get the wrong result.

This fixes the problem by using unsigned for the argument to
x86_output_aligned_bss and x86_output_aligned_bss.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR target/56337
* config/i386/i386-protos.h (x86_output_aligned_bss):
Change align argument to unsigned type.
(x86_elf_aligned_decl_common): Likewise.
* config/i386/i386.c (x86_elf_aligned_decl_common): Likewise.
(x86_output_aligned_bss): Likewise.
---
 gcc/config/i386/i386-protos.h | 4 ++--
 gcc/config/i386/i386.c| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 07ac02aff69..fa1b0d0d787 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -325,9 +325,9 @@ struct ix86_address
 extern int ix86_decompose_address (rtx, struct ix86_address *);
 extern int memory_address_length (rtx, bool);
 extern void x86_output_aligned_bss (FILE *, tree, const char *,
-   unsigned HOST_WIDE_INT, int);
+   unsigned HOST_WIDE_INT, unsigned);
 extern void x86_elf_aligned_decl_common (FILE *, tree, const char *,
-unsigned HOST_WIDE_INT, int);
+unsigned HOST_WIDE_INT, unsigned);
 
 #ifdef RTX_CODE
 extern void ix86_fp_comparison_codes (enum rtx_code code, enum rtx_code *,
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 876a19f4c1f..f86d11dfb11 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -837,7 +837,7 @@ x86_64_elf_unique_section (tree decl, int reloc)
 void
 x86_elf_aligned_decl_common (FILE *file, tree decl,
const char *name, unsigned HOST_WIDE_INT size,
-   int align)
+   unsigned align)
 {
   if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
   && size > (unsigned int)ix86_section_threshold)
@@ -858,7 +858,7 @@ x86_elf_aligned_decl_common (FILE *file, tree decl,
 
 void
 x86_output_aligned_bss (FILE *file, tree decl, const char *name,
-   unsigned HOST_WIDE_INT size, int align)
+   unsigned HOST_WIDE_INT size, unsigned align)
 {
   if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
   && size > (unsigned int)ix86_section_threshold)
-- 
2.17.1



Re: [PATCH 3/3] [PR libfortran/101305] Fix ISO_Fortran_binding.h paths in gfortran testsuite

2021-07-23 Thread Sandra Loosemore

On 7/23/21 8:15 AM, Tobias Burnus wrote:

Hi Sandra,

On 21.07.21 12:17, Tobias Burnus wrote:

On 13.07.21 23:28, Sandra Loosemore wrote:

ISO_Fortran_binding.h is now generated in the libgfortran build
directory where it is on the default include path.  Adjust includes in
the gfortran testsuite not to include an explicit path pointing at the
source directory.

...

-#include "../../../libgfortran/ISO_Fortran_binding.h"
+#include "ISO_Fortran_binding.h"

Unfortunately, that does not help.


It seems as if the following works in the *.exp file:

# Flags for finding libgfortran ISO*.h files.
if [info exists TOOL_OPTIONS] {
    set specpath [get_multilibs ${TOOL_OPTIONS}]
} else {
    set specpath [get_multilibs]
}
set options "-I $specpath/libgfortran/"

I am not sure whether that should/can be added into
   gfortran.dg/dg.exp
or whether we only want to do this in
   ts29113/ts29113.exp
alias
   c-interop/interop.exp
   f18-c-interop/interop.exp
   ...

That seems to work fine with -m32 and -m64.


Well, given that the original patch in this thread was for tests outside 
the ts29113 testsuite, any fix has to go someplace where those tests 
would pick it up too.


I'm not seeing the include path failures Tobias is seeing, so I can't 
confirm his change fixes them, either.  When I do "make check-fortran" 
in my build tree, it seems to be finding the include files because of 
there being a pile of -B options added to the gfortran command line.  I 
don't know where those are coming from, or why this isn't working for 
Tobias.  :-S


-Sandra



Re: [PATCH] c++: suppress all warnings on memper pointers to work around dICE [PR101219]

2021-07-23 Thread Sergei Trofimovich via Gcc-patches
On Fri, 23 Jul 2021 10:33:09 -0600
Jeff Law  wrote:

> On 7/22/2021 5:15 PM, Sergei Trofimovich via Gcc-patches wrote:
> > From: Sergei Trofimovich 
> >
> > r12-1804 ("cp: add support for per-location warning groups.") among other
> > things removed warning suppression from a few places including ptrmemfuncs.
> >
> > Currently ptrmemfuncs don't have valid BINFO attached which causes ICEs
> > in access checks:
> >
> >  crash_signal
> >  gcc/toplev.c:328
> >  perform_or_defer_access_check(tree_node*, tree_node*, tree_node*, int, 
> > access_failure_info*)
> >  gcc/cp/semantics.c:490
> >  finish_non_static_data_member(tree_node*, tree_node*, tree_node*)
> >  gcc/cp/semantics.c:2208
> >  ...
> >
> > The change suppresses warnings again until we provide BINFOs for 
> > ptrmemfuncs.
> >
> > PR c++/101219
> >
> > gcc/cp/ChangeLog:
> >
> > * typeck.c (build_ptrmemfunc_access_expr): Suppress all warnings
> > to avoid ICE.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/torture/pr101219.C: New test.  
> The C++ maintainers have the final say here, but ISTM that warning 
> suppression shouldn't be used to avoid an ICE, even an ICE within the 
> warning or diagnostic code itself.

Sounds good. I agree fixing it correctly is preferable and should improve
diagnostic on this very test case compared to gcc-11.

I'll need some help plumbing TYPE_BINFO() around build_ptrmemfunc_type().
My attempts to use `xref_basetypes (t, NULL_TREE);` to derive it for a
fresh expression only shuffles ICEs around.

-- 

  Sergei