OpenMP patch review and work in progress
Hi! Now that GCC 12 branched, can I ask you for: 1) pings for OpenMP patches that are ready and you'd like to get reviewed for GCC 13 2) what OpenMP 5.0/5.1/5.2 features people are already working on and plan to post patches later during stage1 Thanks Jakub
Re: GCC 12.1 Release Candidate available from gcc.gnu.org
Jakub Jelinek writes: > The first release candidate for GCC 12.1 is available [...] There is an unfixed bogus warning that is a regression in 12 and that I think will have a pretty wide effect (any code that assigns/appends a 1-char string literal to std::string): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105329 For example, in my relatively small codebase I had about 20 instances of this warning. Seeing that it's enabled as part of -Wall (not just -Wextra), I believe there will be a lot of grumpy users. There is a proposed work around in this (duplicate) bug that looks pretty simple: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104336 Perhaps it makes sense to consider it?
Re: GCC 12.1 Release Candidate available from gcc.gnu.org
On Mon, 2 May 2022, Boris Kolpackov wrote: Jakub Jelinek writes: The first release candidate for GCC 12.1 is available [...] There is an unfixed bogus warning that is a regression in 12 and that I think will have a pretty wide effect (any code that assigns/appends a 1-char string literal to std::string): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105329 For example, in my relatively small codebase I had about 20 instances of this warning. Seeing that it's enabled as part of -Wall (not just -Wextra), I believe there will be a lot of grumpy users. There is a proposed work around in this (duplicate) bug that looks pretty simple: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104336 Perhaps it makes sense to consider it? Please no, that workaround looks like a fragile hack (basically writing a+b-a to obfuscate b) that may break if you look at it sideways and likely makes the generated code a bit worse. Besides, we should take it as a hint that user code is also likely to trigger the warning by accident. IMO there are several ways to make progress on this in parallel: * improve the optimizer (as investigated by Andrew) * tweak the warning so it becomes either cleverer or less eager (maybe even use the fact that this special case is in a system header? that should be a last resort though) * battle that has already been lost, no need to rehash it: --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1450,3 +1450,3 @@ Warn when a declaration has duplicate const, volatile, restrict or _Atomic speci Wrestrict -C ObjC C++ ObjC++ Var(warn_restrict) Warning LangEnabledBy(C ObjC C++ ObjC++, Wall) +C ObjC C++ ObjC++ Var(warn_restrict) Warning LangEnabledBy(C ObjC C++ ObjC++, Wextra) Warn when an argument passed to a restrict-qualified parameter aliases with Or admit that --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -5729,4 +5729,3 @@ by this option and not enabled by the latter and vice versa. This enables all the warnings about constructions that some users -consider questionable, and that are easy to avoid (or modify to -prevent the warning), even in conjunction with macros. This also +consider questionable. This also enables some language-specific warnings described in @ref{C++ Dialect -- Marc Glisse
OpenMP patch review and work in progress
Hi Jakub, hello all, let's start with a list of smaller patches, which are pending review but each of them should be relatively quickly approvable. Some are really tiny and obvious bug fixes - others are a bit larger but still smallish: * [Patch] OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593419.html * [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593260.html * [Patch] OpenMP, libgomp: Add new runtime routine omp_target_is_accessible. https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591601.html * [PATCH, OpenMP, C++] Allow classes with static members to be mappable https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591449.html * [PATCH] OpenMP, C++: Add template support for the has_device_addr clause. https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590813.html * [PATCH, OpenMP, C/C++] Handle array reference base-pointers in array sections https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590658.html * OpenMP, libgomp: Add new runtime routines omp_target_memcpy_async and omp_target_memcpy_rect_async https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590651.html * [PATCH] Add a restriction on allocate clause (OpenMP 5.0) https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590597.html * OpenMP, libgomp: Environment variable syntax extension https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588728.html I think it makes sense to reduce this list before starting on the larger list. On the larger side, I collected the following ones. * Memory Management - OpenMP: allow requires dynamic_allocators https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587201.html - libgomp, nvptx: low-latency memory allocator https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587207.html - Pinned Memory thread https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587612.html - Support for allocate directive (OpenMP 5.0) https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588367.html * declare mapper https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591973.html * Metadirectives https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586600.html * [PATCH 0/5] openmp: Handle pinned and unified shared memory. https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591349.html * [PATCH 00/16] OpenMP: lvalues in "map" clauses and struct handling rework https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584994.html Plus: * Fortran deep-mapping patch I intent to submit it in pieces. Lastest email in the thread: https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593562.html * OpenACC Kernels/Graphite: That's not OpenMP but rather generic, but I want to mention it for completeness: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586901.html Tobias
Re: GCC 12.1 Release Candidate available from gcc.gnu.org
On Mon, 2 May 2022, 13:26 Marc Glisse via Gcc, wrote: > On Mon, 2 May 2022, Boris Kolpackov wrote: > > > Jakub Jelinek writes: > > > >> The first release candidate for GCC 12.1 is available [...] > > > > There is an unfixed bogus warning that is a regression in 12 > > and that I think will have a pretty wide effect (any code > > that assigns/appends a 1-char string literal to std::string): > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105329 > > > > For example, in my relatively small codebase I had about 20 > > instances of this warning. Seeing that it's enabled as part > > of -Wall (not just -Wextra), I believe there will be a lot > > of grumpy users. > > > > There is a proposed work around in this (duplicate) bug that > > looks pretty simple: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104336 > > > > Perhaps it makes sense to consider it? > > Please no, that workaround looks like a fragile hack (basically writing > a+b-a to obfuscate b) that may break if you look at it sideways and likely > makes the generated code a bit worse. Agreed. And it makes the variable names completely misleading (although that would be easy to fix).
Re: How to regenerate aclocal.m4?
On Sun, May 1, 2022 at 7:09 PM Maciej W. Rozycki wrote: > > On Sat, 30 Apr 2022, Zopolis0 via Gcc wrote: > > > I'm trying to regenerate autotools files in liboffloadmic (and other > > directories) but when running automake it tells me I need to run aclocal, > > which gives me a warning about not finding AM_ENABLE_MULTILIB in library, > > which then translates into an autoconf error. How do I fix this? Is there a > > specific aclocal version? > > You need to use the same options that automatic regeneration via `make' > would. They are given in ACLOCAL_AMFLAGS near the top of Makefile.am. > > HTH, > > Maciej To go a bit further into detail, what's necessary is that the correct directory is searched for macro includes. AM_ENABLE_MULTILIB is defined in `config/multi.m4`, so you need `-I config` in the flags you use with aclocal (or whatever the relative path to config is from your current working directory). Eric
Multiple types of load/store: how to create .md rules?
All, Thanks for all the help from the past. I'm (still) working on porting GCC to a new processor ISA and ran into the following problem: the CPU supports two kinds of register+offset based loads (and stores). The generic format accepts any base register and any offset. The syntax for this type of operation is: $rX <- mem[$rB + ] For code compatness reasons, there's another (shorter) form, that accepts only $sp and $fp (the stack and frame pointers) as base registers, and an offset in the range of -256 and 252. Finally this form inly supports a transfer size of 32-bits. The syntax for this format is: $rX <- mem[tiny $rB + ] I would like to coerce GCC into using the 'tiny' form, whenever it can. In order to do that, I've created a new predicate: (define_predicate "brew_tiny_memory_operand" (and (match_operand 0 "memory_operand") (match_test "MEM_P(op) && brew_legitimate_tiny_address_p(XEXP(op, 0))" ) ) ) The function referenced in the predicate is as follows: static bool brew_reg_ok_for_tiny_base_p(const_rtx reg) { int regno = REGNO(reg); return regno == BREW_REG_FP || regno == BREW_REG_SP || regno == BREW_QFP || regno == BREW_QAP; } bool brew_legitimate_tiny_address_p(rtx x) { if ( GET_CODE(x) == PLUS && REG_P(XEXP(x, 0)) && brew_reg_ok_for_tiny_base_p(XEXP(x, 0)) && CONST_INT_P(XEXP(x, 1)) && IN_RANGE(INTVAL(XEXP(x, 1)), -256, 252) && (INTVAL(XEXP(x,1)) & 3) == 0 ) return true; return false; } Finally, I've created rules for the use of these new predicates: (define_expand "movsi" [(set (match_operand:SI 0 "general_operand" "") (match_operand:SI 1 "general_operand" "") )] "" " { if (!(reload_in_progress || reload_completed)) { if(MEM_P(operands[0])) { // For stores, force the second arg. into a register operands[1] = force_reg(SImode, operands[1]); // We should make sure that the address // generated for the store is based on a // + pattern if(MEM_P(XEXP(operands[0], 0))) operands[0] = gen_rtx_MEM( SImode, force_reg(SImode, XEXP(operands[0], 0)) ); } else if(MEM_P(operands[1])) { // We should make sure that the address // generated for the load is based on a // + pattern if(MEM_P(XEXP (operands[1], 0))) operands[1] = gen_rtx_MEM( SImode, force_reg(SImode, XEXP(operands[1], 0)) ); } } }") (define_insn "*movsi_tiny_store" [(set (match_operand:SI 0 "brew_tiny_memory_operand" "=m") (match_operand:SI 1 "register_operand" "r") )] "" "mem[tiny %0] <- %1" [(set_attr "length" "2")] ) (define_insn "*movsi_tiny_load" [(set (match_operand:SI 0 "register_operand" "=r") (match_operand:SI 1 "brew_tiny_memory_operand" "m") )] "" "%0 <- mem[tiny %1]" [(set_attr "length" "2")] ) (define_insn "*movsi_general" [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,r,m,r") (match_operand:SI 1 "general_operand""N,L,i,r,r,m") )] "" "@ %0 <- tiny %1 %0 <- short %1 %0 <- %1 %0 <- %1 mem[%0] <- %1 %0 <- mem[%1]" [(set_attr "length" "2,4,6,2,6,6")] ) When I tested this code, I've noticed a funny thing: the function prologs and epilogs seem to use the 'tiny' versions of loads/stores just fine. However (I think) some of the spills/reloads for local variables end up using the extended version. Even more surprising is that this behavior only manifests itself during optimized (-Os, -O1, -O2) compilation. It seems that -O0 is free from this problem. Here's one example: .file "dtoa.c" .text .global __udivsi3 .p2align1 .global quorem .type quorem, @function quorem: mem[tiny $sp + -4] <- $fp ### <--- OK mem[tiny $sp + -8] <- $lr mem[tiny $sp + -12] <- $r8 mem[tiny $sp + -16] <- $r9 mem[tiny $sp + -20] <- $r10 mem[tiny $sp + -24] <- $r11 $fp <- $sp $sp <- short $sp - 48 $r9 <- $a0 $r8 <- mem[$a1 + 16] $r0 <- mem[$a0 + 16] if signed $r0 < $r8 $pc <- .L7 $r8 <- tiny $r8 + -1 $r0 <- tiny 2 $r0 <- $r8 << $r0 $r10 <- short $a0 + 20 $r1 <- $r10 + $r0 mem[$fp + -32] <- $r1 ## < should be 'tiny' $r11 <- mem[$r1] To a previous problem I've asked, Andrew Pinski replied that I should merge all *movsi patterns into a single one to avoid (in that case) strange deletions in the generated assembly. Is that possible here? It appears to me that I would need the ability to differentiate the different patterns using constraints, but is t