OpenMP patch review and work in progress

2022-05-02 Thread Jakub Jelinek via Gcc
Hi!

Now that GCC 12 branched, can I ask you for:
1) pings for OpenMP patches that are ready and you'd like to get reviewed for
   GCC 13
2) what OpenMP 5.0/5.1/5.2 features people are already working on and plan to
   post patches later during stage1

Thanks

Jakub



Re: GCC 12.1 Release Candidate available from gcc.gnu.org

2022-05-02 Thread Boris Kolpackov
Jakub Jelinek  writes:

> The first release candidate for GCC 12.1 is available [...]

There is an unfixed bogus warning that is a regression in 12
and that I think will have a pretty wide effect (any code
that assigns/appends a 1-char string literal to std::string):

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105329

For example, in my relatively small codebase I had about 20
instances of this warning. Seeing that it's enabled as part
of -Wall (not just -Wextra), I believe there will be a lot
of grumpy users.

There is a proposed work around in this (duplicate) bug that
looks pretty simple:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104336

Perhaps it makes sense to consider it?


Re: GCC 12.1 Release Candidate available from gcc.gnu.org

2022-05-02 Thread Marc Glisse via Gcc

On Mon, 2 May 2022, Boris Kolpackov wrote:


Jakub Jelinek  writes:


The first release candidate for GCC 12.1 is available [...]


There is an unfixed bogus warning that is a regression in 12
and that I think will have a pretty wide effect (any code
that assigns/appends a 1-char string literal to std::string):

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105329

For example, in my relatively small codebase I had about 20
instances of this warning. Seeing that it's enabled as part
of -Wall (not just -Wextra), I believe there will be a lot
of grumpy users.

There is a proposed work around in this (duplicate) bug that
looks pretty simple:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104336

Perhaps it makes sense to consider it?


Please no, that workaround looks like a fragile hack (basically writing 
a+b-a to obfuscate b) that may break if you look at it sideways and likely 
makes the generated code a bit worse. Besides, we should take it as a hint 
that user code is also likely to trigger the warning by accident.


IMO there are several ways to make progress on this in parallel:

* improve the optimizer (as investigated by Andrew)

* tweak the warning so it becomes either cleverer or less eager (maybe 
even use the fact that this special case is in a system header? that 
should be a last resort though)


* battle that has already been lost, no need to rehash it:

--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1450,3 +1450,3 @@ Warn when a declaration has duplicate const, 
volatile, restrict or _Atomic speci

 Wrestrict
-C ObjC C++ ObjC++ Var(warn_restrict) Warning LangEnabledBy(C ObjC C++ ObjC++, 
Wall)
+C ObjC C++ ObjC++ Var(warn_restrict) Warning LangEnabledBy(C ObjC C++ ObjC++, 
Wextra)
 Warn when an argument passed to a restrict-qualified parameter aliases with

Or admit that

--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -5729,4 +5729,3 @@ by this option and not enabled by the latter and vice 
versa.
 This enables all the warnings about constructions that some users
-consider questionable, and that are easy to avoid (or modify to
-prevent the warning), even in conjunction with macros.  This also
+consider questionable.  This also
 enables some language-specific warnings described in @ref{C++ Dialect

--
Marc Glisse


OpenMP patch review and work in progress

2022-05-02 Thread Tobias Burnus
Hi Jakub, hello all,

let's start with a list of smaller patches, which are pending review
but each of them should be relatively quickly approvable. Some are really
tiny and obvious bug fixes - others are a bit larger but still smallish:

* [Patch] OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg
  https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593419.html
* [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and 
omp_{gs}et_teams_thread_limit on offload devices
  https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593260.html
* [Patch] OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.
  https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591601.html
* [PATCH, OpenMP, C++] Allow classes with static members to be mappable
  https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591449.html
* [PATCH] OpenMP, C++: Add template support for the has_device_addr clause.
  https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590813.html
* [PATCH, OpenMP, C/C++] Handle array reference base-pointers in array sections
  https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590658.html
* OpenMP, libgomp: Add new runtime routines omp_target_memcpy_async and 
omp_target_memcpy_rect_async
  https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590651.html
* [PATCH] Add a restriction on allocate clause (OpenMP 5.0)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590597.html
* OpenMP, libgomp: Environment variable syntax extension
  https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588728.html

I think it makes sense to reduce this list before starting on the larger list.
On the larger side, I collected the following ones.

* Memory Management
- OpenMP: allow requires dynamic_allocators
  https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587201.html
- libgomp, nvptx: low-latency memory allocator
  https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587207.html
- Pinned Memory thread
  https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587612.html
- Support for allocate directive (OpenMP 5.0)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588367.html

* declare mapper
  https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591973.html

* Metadirectives
  https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586600.html

* [PATCH 0/5] openmp: Handle pinned and unified shared memory.
  https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591349.html

* [PATCH 00/16] OpenMP: lvalues in "map" clauses and struct handling rework
  https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584994.html

Plus:

* Fortran deep-mapping patch
  I intent to submit it in pieces.
  Lastest email in the thread:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593562.html

* OpenACC Kernels/Graphite: That's not OpenMP but rather generic, but I
  want to mention it for completeness:
   https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586901.html

Tobias



Re: GCC 12.1 Release Candidate available from gcc.gnu.org

2022-05-02 Thread Jonathan Wakely via Gcc
On Mon, 2 May 2022, 13:26 Marc Glisse via Gcc,  wrote:

> On Mon, 2 May 2022, Boris Kolpackov wrote:
>
> > Jakub Jelinek  writes:
> >
> >> The first release candidate for GCC 12.1 is available [...]
> >
> > There is an unfixed bogus warning that is a regression in 12
> > and that I think will have a pretty wide effect (any code
> > that assigns/appends a 1-char string literal to std::string):
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105329
> >
> > For example, in my relatively small codebase I had about 20
> > instances of this warning. Seeing that it's enabled as part
> > of -Wall (not just -Wextra), I believe there will be a lot
> > of grumpy users.
> >
> > There is a proposed work around in this (duplicate) bug that
> > looks pretty simple:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104336
> >
> > Perhaps it makes sense to consider it?
>
> Please no, that workaround looks like a fragile hack (basically writing
> a+b-a to obfuscate b) that may break if you look at it sideways and likely
> makes the generated code a bit worse.


Agreed. And it makes the variable names completely misleading (although
that would be easy to fix).


Re: How to regenerate aclocal.m4?

2022-05-02 Thread Eric Gallager via Gcc
On Sun, May 1, 2022 at 7:09 PM Maciej W. Rozycki  wrote:
>
> On Sat, 30 Apr 2022, Zopolis0 via Gcc wrote:
>
> > I'm trying to regenerate autotools files in liboffloadmic (and other
> > directories) but when running automake it tells me I need to run aclocal,
> > which gives me a warning about not finding AM_ENABLE_MULTILIB in library,
> > which then translates into an autoconf error. How do I fix this? Is there a
> > specific aclocal version?
>
>  You need to use the same options that automatic regeneration via `make'
> would.  They are given in ACLOCAL_AMFLAGS near the top of Makefile.am.
>
>  HTH,
>
>   Maciej

To go a bit further into detail, what's necessary is that the correct
directory is searched for macro includes. AM_ENABLE_MULTILIB is
defined in `config/multi.m4`, so you need `-I config` in the flags you
use with aclocal (or whatever the relative path to config is from your
current working directory).

Eric


Multiple types of load/store: how to create .md rules?

2022-05-02 Thread Andras Tantos
All,

Thanks for all the help from the past. I'm (still) working on porting
GCC to a new processor ISA and ran into the following problem: the CPU
supports two kinds of register+offset based loads (and stores).

The generic format accepts any base register and any offset. The syntax
for this type of operation is:

  $rX <- mem[$rB + ]

For code compatness reasons, there's another (shorter) form, that
accepts only $sp and $fp (the stack and frame pointers) as base
registers, and an offset in the range of -256 and 252. Finally this
form inly supports a transfer size of 32-bits. The syntax for this
format is:

  $rX <- mem[tiny $rB + ]

I would like to coerce GCC into using the 'tiny' form, whenever it can.
In order to do that, I've created a new predicate:

  (define_predicate "brew_tiny_memory_operand"
(and
  (match_operand 0 "memory_operand")
  (match_test
"MEM_P(op) && 
brew_legitimate_tiny_address_p(XEXP(op, 0))"
  )
)
  )

The function referenced in the predicate is as follows:

  static bool
  brew_reg_ok_for_tiny_base_p(const_rtx reg)
  {
int regno = REGNO(reg);

return
  regno == BREW_REG_FP ||
  regno == BREW_REG_SP ||
  regno == BREW_QFP ||
  regno == BREW_QAP;
  }

  bool
  brew_legitimate_tiny_address_p(rtx x)
  {
if (
  GET_CODE(x) == PLUS &&
  REG_P(XEXP(x, 0)) &&
  brew_reg_ok_for_tiny_base_p(XEXP(x, 0)) &&
  CONST_INT_P(XEXP(x, 1)) &&
  IN_RANGE(INTVAL(XEXP(x, 1)), -256, 252) &&
  (INTVAL(XEXP(x,1)) & 3) == 0
)
  return true;
return false;
  }

Finally, I've created rules for the use of these new predicates:

  (define_expand "movsi"
[(set
  (match_operand:SI 0 "general_operand" "")
  (match_operand:SI 1 "general_operand" "")
)]
""
"
  {
if (!(reload_in_progress || reload_completed))
  {
if(MEM_P(operands[0]))
  {
// For stores, force the second arg. into a register
operands[1] = force_reg(SImode, operands[1]);
// We should make sure that the address
// generated for the store is based on a
// + pattern
if(MEM_P(XEXP(operands[0], 0)))
  operands[0] = gen_rtx_MEM(
SImode,
force_reg(SImode, XEXP(operands[0], 0))
  );
  }
else if(MEM_P(operands[1]))
  {
// We should make sure that the address
// generated for the load is based on a
// + pattern
if(MEM_P(XEXP (operands[1], 0)))
  operands[1] = gen_rtx_MEM(
SImode,
force_reg(SImode, XEXP(operands[1], 0))
  );
  }
  }
  }")

  (define_insn "*movsi_tiny_store"
[(set
  (match_operand:SI 0 "brew_tiny_memory_operand"  "=m")
  (match_operand:SI 1 "register_operand"  "r")
)]
""
"mem[tiny %0] <- %1"
[(set_attr "length" "2")]
  )

  (define_insn "*movsi_tiny_load"
[(set
  (match_operand:SI 0 "register_operand"  "=r")
  (match_operand:SI 1 "brew_tiny_memory_operand"  "m")
)]
""
"%0 <- mem[tiny %1]"
[(set_attr "length" "2")]
  )

  (define_insn "*movsi_general"
[(set
  (match_operand:SI 0 "nonimmediate_operand"  "=r,r,r,r,m,r")
  (match_operand:SI 1 "general_operand""N,L,i,r,r,m")
)]
""
"@
%0 <- tiny %1
%0 <- short %1
%0 <- %1
%0 <- %1
mem[%0] <- %1
%0 <- mem[%1]"
[(set_attr "length" "2,4,6,2,6,6")]
  )

When I tested this code, I've noticed a funny thing: the function
prologs and epilogs seem to use the 'tiny' versions of loads/stores
just fine. However (I think) some of the spills/reloads for local
variables end up using the extended version. Even more surprising is
that this behavior only manifests itself during optimized (-Os, -O1,
-O2) compilation. It seems that -O0 is free from this problem. Here's
one example:

.file   "dtoa.c"
.text
.global __udivsi3
.p2align1
.global quorem
.type   quorem, @function
  quorem:
mem[tiny $sp + -4] <- $fp ### <--- OK
mem[tiny $sp + -8] <- $lr
mem[tiny $sp + -12] <- $r8
mem[tiny $sp + -16] <- $r9
mem[tiny $sp + -20] <- $r10
mem[tiny $sp + -24] <- $r11
$fp <- $sp
$sp <- short $sp - 48
$r9 <- $a0
$r8 <- mem[$a1 + 16]
$r0 <- mem[$a0 + 16]
if signed $r0 < $r8 $pc <- .L7
$r8 <- tiny $r8 + -1
$r0 <- tiny 2
$r0 <- $r8 << $r0
$r10 <- short $a0 + 20
$r1 <- $r10 + $r0
mem[$fp + -32] <- $r1  ## < should be 'tiny'
$r11 <- mem[$r1]

To a previous problem I've asked, Andrew Pinski replied that I should
merge all *movsi patterns into a single one to avoid (in that case)
strange deletions in the generated assembly. Is that possible here? It
appears to me that I would need the ability to differentiate the
different patterns using constraints, but is t