[PATCH] builtins: Add various __builtin_*f{16,32,64,128,32x,64x,128x} builtins

2022-10-16 Thread Jakub Jelinek via Gcc-patches
Hi!

When working on libstdc++ extended float support in , I found that
we need various builtins for the _Float{16,32,64,128,32x,64x,128x} types.
Glibc 2.26 and later provides the underlying libm routines (except for
_Float16 and _Float128x for the time being) and in libstdc++ I think we
need at least the _Float128 builtins on x86_64, i?86, powerpc64le and ia64
(when long double is IEEE quad, we can handle it by using __builtin_*l
instead), because without the builtins the overloads couldn't be constexpr
(say when it would declare the *f128 extern "C" routines itself and call
them).

The testcase covers just types of those builtins and their constant
folding, so doesn't need actual libm support.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-10-15  Jakub Jelinek  

* builtin-types.def (BT_FLOAT16_PTR, BT_FLOAT32_PTR, BT_FLOAT64_PTR,
BT_FLOAT128_PTR, BT_FLOAT32X_PTR, BT_FLOAT64X_PTR, BT_FLOAT128X_PTR):
New DEF_PRIMITIVE_TYPE.
(BT_FN_INT_FLOAT16, BT_FN_INT_FLOAT32, BT_FN_INT_FLOAT64,
BT_FN_INT_FLOAT128, BT_FN_INT_FLOAT32X, BT_FN_INT_FLOAT64X,
BT_FN_INT_FLOAT128X, BT_FN_LONG_FLOAT16, BT_FN_LONG_FLOAT32,
BT_FN_LONG_FLOAT64, BT_FN_LONG_FLOAT128, BT_FN_LONG_FLOAT32X,
BT_FN_LONG_FLOAT64X, BT_FN_LONG_FLOAT128X, BT_FN_LONGLONG_FLOAT16,
BT_FN_LONGLONG_FLOAT32, BT_FN_LONGLONG_FLOAT64,
BT_FN_LONGLONG_FLOAT128, BT_FN_LONGLONG_FLOAT32X,
BT_FN_LONGLONG_FLOAT64X, BT_FN_LONGLONG_FLOAT128X): New
DEF_FUNCTION_TYPE_1.
(BT_FN_FLOAT16_FLOAT16_FLOAT16PTR, BT_FN_FLOAT32_FLOAT32_FLOAT32PTR,
BT_FN_FLOAT64_FLOAT64_FLOAT64PTR, BT_FN_FLOAT128_FLOAT128_FLOAT128PTR,
BT_FN_FLOAT32X_FLOAT32X_FLOAT32XPTR,
BT_FN_FLOAT64X_FLOAT64X_FLOAT64XPTR,
BT_FN_FLOAT128X_FLOAT128X_FLOAT128XPTR, BT_FN_FLOAT16_FLOAT16_INT,
BT_FN_FLOAT32_FLOAT32_INT, BT_FN_FLOAT64_FLOAT64_INT,
BT_FN_FLOAT128_FLOAT128_INT, BT_FN_FLOAT32X_FLOAT32X_INT,
BT_FN_FLOAT64X_FLOAT64X_INT, BT_FN_FLOAT128X_FLOAT128X_INT,
BT_FN_FLOAT16_FLOAT16_INTPTR, BT_FN_FLOAT32_FLOAT32_INTPTR,
BT_FN_FLOAT64_FLOAT64_INTPTR, BT_FN_FLOAT128_FLOAT128_INTPTR,
BT_FN_FLOAT32X_FLOAT32X_INTPTR, BT_FN_FLOAT64X_FLOAT64X_INTPTR,
BT_FN_FLOAT128X_FLOAT128X_INTPTR, BT_FN_FLOAT16_FLOAT16_LONG,
BT_FN_FLOAT32_FLOAT32_LONG, BT_FN_FLOAT64_FLOAT64_LONG,
BT_FN_FLOAT128_FLOAT128_LONG, BT_FN_FLOAT32X_FLOAT32X_LONG,
BT_FN_FLOAT64X_FLOAT64X_LONG, BT_FN_FLOAT128X_FLOAT128X_LONG): New
DEF_FUNCTION_TYPE_2.
(BT_FN_FLOAT16_FLOAT16_FLOAT16_INTPTR,
BT_FN_FLOAT32_FLOAT32_FLOAT32_INTPTR,
BT_FN_FLOAT64_FLOAT64_FLOAT64_INTPTR,
BT_FN_FLOAT128_FLOAT128_FLOAT128_INTPTR,
BT_FN_FLOAT32X_FLOAT32X_FLOAT32X_INTPTR,
BT_FN_FLOAT64X_FLOAT64X_FLOAT64X_INTPTR,
BT_FN_FLOAT128X_FLOAT128X_FLOAT128X_INTPTR): New DEF_FUNCTION_TYPE_3.
* builtins.def (ACOSH_TYPE, ATAN2_TYPE, ATANH_TYPE, COSH_TYPE,
FDIM_TYPE, HUGE_VAL_TYPE, HYPOT_TYPE, ILOGB_TYPE, LDEXP_TYPE,
LGAMMA_TYPE, LLRINT_TYPE, LOG10_TYPE, LRINT_TYPE, MODF_TYPE,
NEXTAFTER_TYPE, REMQUO_TYPE, SCALBLN_TYPE, SCALBN_TYPE, SINH_TYPE):
Define and undefine later.
(FMIN_TYPE, SQRT_TYPE): Undefine at a later line.
(INF_TYPE): Define at a later line.
(BUILT_IN_ACOSH, BUILT_IN_ACOS, BUILT_IN_ASINH, BUILT_IN_ASIN,
BUILT_IN_ATAN2, BUILT_IN_ATANH, BUILT_IN_ATAN, BUILT_IN_CBRT,
BUILT_IN_COSH, BUILT_IN_COS, BUILT_IN_ERFC, BUILT_IN_ERF,
BUILT_IN_EXP2, BUILT_IN_EXP, BUILT_IN_EXPM1, BUILT_IN_FDIM,
BUILT_IN_FMOD, BUILT_IN_FREXP, BUILT_IN_HYPOT, BUILT_IN_ILOGB,
BUILT_IN_LDEXP, BUILT_IN_LGAMMA, BUILT_IN_LLRINT, BUILT_IN_LLROUND,
BUILT_IN_LOG10, BUILT_IN_LOG1P, BUILT_IN_LOG2, BUILT_IN_LOGB,
BUILT_IN_LOG, BUILT_IN_LRINT, BUILT_IN_LROUND, BUILT_IN_MODF,
BUILT_IN_NEXTAFTER, BUILT_IN_POW, BUILT_IN_REMAINDER, BUILT_IN_REMQUO,
BUILT_IN_SCALBLN, BUILT_IN_SCALBN, BUILT_IN_SINH, BUILT_IN_SIN,
BUILT_IN_TANH, BUILT_IN_TAN, BUILT_IN_TGAMMA): Add
DEF_EXT_LIB_FLOATN_NX_BUILTINS.
(BUILT_IN_HUGE_VAL): Use HUGE_VAL_TYPE instead of INF_TYPE in
DEF_GCC_FLOATN_NX_BUILTINS.
* fold-const-call.cc (fold_const_call_ss): Add various CASE_CFN_*_FN:
cases when CASE_CFN_* is present.
(fold_const_call_sss): Likewise.
* builtins.cc (mathfn_built_in_2): Use CASE_MATHFN_FLOATN instead of
CASE_MATHFN for various builtins in SEQ_OF_CASE_MATHFN macro.
(builtin_with_linkage_p): Add CASE_FLT_FN_FLOATN_NX for various
builtins next to CASE_FLT_FN.
* fold-const.cc (tree_call_nonnegative_warnv_p): Add CASE_CFN_*_FN:
next to CASE_CFN_*: for various builtins.
* tree-call-cdce.cc (can_test_argument_range): Add
CASE_FLT_FN_FLOATN_NX next to CASE_FLT_FN for various builtins.
(edom_only_function): Likewise.

[RFC PATCH] libstdc++, v2: Partial library support for std::float{16,32,64,128}_t

2022-10-16 Thread Jakub Jelinek via Gcc-patches
Hi!

As the __bf16 support is now in at least on x86_64/i686, I've
updated my patch to cover bfloat16_t as well and implemented almost
everything for  - the only thing missing I'm aware of is
std::nextafter std::float16_t and std::bfloat16_t overloads (I think
we probably need to implement that out of line somewhere, or inline? - might
need inline asm barriers) and std::nexttoward overloads (those are
intentional, you said there is a LWG issue about that).
If you want to have  done in a different way, e.g. the patch
groups a lot of different function overloads by the floating point type,
is that ok or do you want to have them one function at a time for all types,
then next?
I could try to handle  too, but am kind of lost there.
The paper dropped the explicit std::complex specializations, can they stay
around as is and should new overloads be added for the
_Float*/__gnu_cxx::__bfloat16_t types?
And I/O etc. support is missing, not sure I'm able to handle that and if it
is e.g. possible to keep that support out of libstdc++.so.6, because what
extended floating point types one has on a particular arch could change over
time (I mean e.g. bfloat16_t support or float16_t support can be added
etc.).

Bootstrapped/regtested on x86_64-linux and i686-linux.

2022-10-15  Jakub Jelinek  

* include/std/stdfloat: New file.
* include/std/numbers (__glibcxx_numbers): Define and use it
for __float128 explicit instantiations as well as
_Float{16,32,64,128} and __gnu_cxx::__bfloat16_t.
* include/std/atomic (atomic<_Float16>, atomic<_Float32>,
atomic<_Float64>, atomic<_Float128>, atomic<__gnu_cxx::__bfloat16_t>):
New explicit instantiations.
* include/std/type_traits (__is_floating_point_helper<_Float16>,
__is_floating_point_helper<_Float32>,
__is_floating_point_helper<_Float64>,
__is_floating_point_helper<_Float128>,
__is_floating_point_helper<__gnu_cxx::__bfloat16_t>): Likewise.
* include/std/limits (__glibcxx_concat3_, __glibcxx_concat3,
__glibcxx_float_n): Define.
(numeric_limits<_Float16>, numeric_limits<_Float32>,
numeric_limits<_Float64>, numeric_limits<_Float128>,
numeric_limits<__gnu_cxx::__bfloat16_t>): New explicit instantiations.
* include/bits/std_abs.h (abs): New overloads for
_Float{16,32,64,128} and __gnu_cxx::__bfloat16_t.
* include/bits/c++config (_GLIBCXX_LDOUBLE_IS_IEEE_BINARY128): Define
if long double is IEEE quad.
(__gnu_cxx::__bfloat16_t): New using.
* include/c_global/cmath (acos, asin, atan, atan2, ceil, cos, cosh,
exp, fabs, floor, fmod, frexp, ldexp, log, log10, modf, pow, sin,
sinh, sqrt, tan, tanh, fpclassify, isfinite, isinf, isnan, isnormal,
signbit, isgreater, isgreaterequal, isless, islessequal,
islessgreater, isunordered, acosh, asinh, atanh, cbrt, copysign, erf,
erfc, exp2, expm1, fdim, fma, fmax, fmin, hypot, ilogb, lgamma,
llrint, llround, log1p, log2, logb, lrint, lround, nearbyint,
nextafter, remainder, rint, round, scalbln, scalbn, tgamma, trunc,
lerp): New overloads with _Float{16,32,64,128} or
__gnu_cxx::__bfloat16_t types.
* config/os/gnu-linux/os_defines.h (_GLIBCXX_HAVE_FLOAT128_MATH):
Define if glibc 2.26 and later implements *f128 APIs.
* include/ext/type_traits.h (__promote<_Float16>, __promote<_Float32>,
__promote<_Float64>, __promote<_Float128>,
__promote<__gnu_cxx::__bfloat16_t>): New specializations.
* include/Makefile.am (std_headers): Add stdfloat.
* include/Makefile.in: Regenerated.
* include/precompiled/stdc++.h: Include stdfloat.
* testsuite/18_support/headers/stdfloat/types_std.cc: New test.
* testsuite/18_support/headers/limits/synopsis_cxx23.cc: New test.
* 
testsuite/26_numerics/headers/cmath/c99_classification_macros_c++23.cc:
New test.
* testsuite/26_numerics/headers/cmath/functions_std_c++23.cc: New test.
* testsuite/26_numerics/numbers/4.cc: New test.
* testsuite/29_atomics/atomic_float/requirements_cxx23.cc: New test.

--- libstdc++-v3/include/std/stdfloat.jj2022-10-14 22:32:55.409346491 
+0200
+++ libstdc++-v3/include/std/stdfloat   2022-10-14 22:32:55.408346505 +0200
@@ -0,0 +1,62 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2022 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for m

[committed] Rename "Z" constraint on H8/300 to "Zz".

2022-10-16 Thread Jeff Law via Gcc-patches

I want to use Z for multi-letter constraints.  So first we have to

adjust the existing use of Z.  This does not affect code generation.


Pushed to the trunk,


Jeff


commit 709b2160bcd8f6f57c8754c73d40550895339c7b
Author: Jeff Law 
Date:   Sun Oct 16 10:58:52 2022 -0400

Rename "z" constraint to "Zz" on the H8/300

I want to use Z as a multi-letter constraint.  So first we have to
adjust the existing use of Z.  This does not affect code generation.

gcc/
* config/h8300/constraints.md (Zz constraint): Renamed
from "z".
* config/h8300/movepush.md (movqi_h8sx, movhi_h8sx): Adjust
constraint to use Zz instead of Z.

diff --git a/gcc/config/h8300/constraints.md b/gcc/config/h8300/constraints.md
index 2836232d882..f71996c5f38 100644
--- a/gcc/config/h8300/constraints.md
+++ b/gcc/config/h8300/constraints.md
@@ -211,7 +211,7 @@
   (and (match_code "const_int")
(match_test "exact_log2 (ival & 0xff) != -1")))
 
-(define_constraint "Z"
+(define_constraint "Zz"
   "@internal"
   (and (match_test "TARGET_H8300SX")
(match_code "mem")
diff --git a/gcc/config/h8300/movepush.md b/gcc/config/h8300/movepush.md
index ada4ddd0beb..e895de8ce59 100644
--- a/gcc/config/h8300/movepush.md
+++ b/gcc/config/h8300/movepush.md
@@ -28,7 +28,7 @@
   [(set (attr "length") (symbol_ref "compute_mov_length (operands)"))])
 
 (define_insn_and_split "*movqi_h8sx"
-  [(set (match_operand:QI 0 "general_operand_dst" "=Z,rQ")
+  [(set (match_operand:QI 0 "general_operand_dst" "=Zz,rQ")
(match_operand:QI 1 "general_operand_src" "P4>X,rQi"))]
   "TARGET_H8300SX"
   "#"
@@ -37,7 +37,7 @@
  (clobber (reg:CC CC_REG))])])
 
 (define_insn "*movqi_h8sx"
-  [(set (match_operand:QI 0 "general_operand_dst" "=Z,rQ")
+  [(set (match_operand:QI 0 "general_operand_dst" "=Zz,rQ")
(match_operand:QI 1 "general_operand_src" "P4>X,rQi"))
(clobber (reg:CC CC_REG))]
   "TARGET_H8300SX"
@@ -113,7 +113,7 @@
   [(set (attr "length") (symbol_ref "compute_mov_length (operands)"))])
 
 (define_insn_and_split "*movhi_h8sx"
-  [(set (match_operand:HI 0 "general_operand_dst" "=r,r,Z,Q,rQ")
+  [(set (match_operand:HI 0 "general_operand_dst" "=r,r,Zz,Q,rQ")
(match_operand:HI 1 "general_operand_src" "I,P3>X,P4>X,IP8>X,rQi"))]
   "TARGET_H8300SX"
   "#"
@@ -122,7 +122,7 @@
  (clobber (reg:CC CC_REG))])])
   
 (define_insn "*movhi_h8sx"
-  [(set (match_operand:HI 0 "general_operand_dst" "=r,r,Z,Q,rQ")
+  [(set (match_operand:HI 0 "general_operand_dst" "=r,r,Zz,Q,rQ")
(match_operand:HI 1 "general_operand_src" "I,P3>X,P4>X,IP8>X,rQi"))
(clobber (reg:CC CC_REG))]
   "TARGET_H8300SX"


[committed] Add new constraints for upcoming autoinc fixes on the H8

2022-10-16 Thread Jeff Law via Gcc-patches


GCC does not allow a the operand of an autoinc addressing mode to
overlap with another soure operand in the same insn.  This is primarly
enforced with insn conditions.  However, cases can slip through LRA
and reload.  To address those scenarios we'll take an idea from the
pdp11 port for describing the restriction in constraints as well.

To implement that we need register classes and constraints which are
"all general purpose hardware registers except r0".  And similarly for
r1..r7(sp).

This patch adds those register classes and constraints, but does not
yet use them.

Pushed to the trunk.


Jeff


commit 6366e3e8847af98d4728d55951534769d034d02a
Author: Jeff Law 
Date:   Sun Oct 16 12:43:25 2022 -0400

Add new constraints for upcoming autoinc fixes

GCC does not allow a the operand of an autoinc addressing mode to
overlap with another soure operand in the same insn.  This is primarly
enforced with insn conditions.  However, cases can slip through LRA
and reload.  To address those scenarios we'll take an idea from the
pdp11 port for describing the restriction in constraints as well.

To implement that we need register classes and constraints which are
"all general purpose hardware registers except r0".  And similarly for
r1..r7(sp).

This patch adds those register classes and constraints, but does not
yet use them.

gcc/
* config/h8300/constraints.md (Z0..Z7): New register
constraints.
* config/h8300/h8300.h (reg_class): Add new classes.
(REG_CLASS_NAMES): Similarly.
(REG_CLASS_CONTENTS): Similarly.

diff --git a/gcc/config/h8300/constraints.md b/gcc/config/h8300/constraints.md
index f71996c5f38..6eaffc16975 100644
--- a/gcc/config/h8300/constraints.md
+++ b/gcc/config/h8300/constraints.md
@@ -216,3 +216,28 @@
   (and (match_test "TARGET_H8300SX")
(match_code "mem")
(match_test "CONSTANT_P (XEXP (op, 0))")))
+
+(define_register_constraint "Z0" "NOT_R0_REGS"
+  "@internal")
+
+(define_register_constraint "Z1" "NOT_R1_REGS"
+  "@internal")
+
+(define_register_constraint "Z2" "NOT_R2_REGS"
+  "@internal")
+
+(define_register_constraint "Z3" "NOT_R3_REGS"
+  "@internal")
+
+(define_register_constraint "Z4" "NOT_R4_REGS"
+  "@internal")
+
+(define_register_constraint "Z5" "NOT_R5_REGS"
+  "@internal")
+
+(define_register_constraint "Z6" "NOT_R6_REGS"
+  "@internal")
+
+(define_register_constraint "Z7" "NOT_SP_REGS"
+  "@internal")
+
diff --git a/gcc/config/h8300/h8300.h b/gcc/config/h8300/h8300.h
index 9a6c78cf2d5..45cc4fc7796 100644
--- a/gcc/config/h8300/h8300.h
+++ b/gcc/config/h8300/h8300.h
@@ -282,6 +282,8 @@ extern const char * const *h8_reg_names;
 
 enum reg_class {
   NO_REGS, COUNTER_REGS, SOURCE_REGS, DESTINATION_REGS,
+  NOT_R0_REGS, NOT_R1_REGS, NOT_R2_REGS, NOT_R3_REGS,
+  NOT_R4_REGS, NOT_R5_REGS, NOT_R6_REGS, NOT_SP_REGS,
   GENERAL_REGS, MAC_REGS, ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -291,6 +293,8 @@ enum reg_class {
 
 #define REG_CLASS_NAMES \
 { "NO_REGS", "COUNTER_REGS", "SOURCE_REGS", "DESTINATION_REGS", \
+  "NOT_R0_REGS", "NOT_R1_REGS", "NOT_R2_REGS", "NOT_R3_REGS", \
+  "NOT_R4_REGS", "NOT_R5_REGS", "NOT_R6_REGS", "NOT_SP_REGS", \
   "GENERAL_REGS", "MAC_REGS", "ALL_REGS", "LIM_REGS" }
 
 /* Define which registers fit in which classes.
@@ -302,6 +306,14 @@ enum reg_class {
{0x010},/* COUNTER_REGS */  \
{0x020},/* SOURCE_REGS */   \
{0x040},/* DESTINATION_REGS */  \
+   {0x0fe},/* NOT_R0_REGS */   \
+   {0x0fd},/* NOT_R1_REGS */   \
+   {0x0fb},/* NOT_R2_REGS */   \
+   {0x0f7},/* NOT_R3_REGS */   \
+   {0x0ef},/* NOT_R4_REGS */   \
+   {0x0df},/* NOT_R5_REGS */   \
+   {0x0bf},/* NOT_R6_REGS */   \
+   {0x07f},/* NOT_SP_REGS */   \
{0xeff},/* GENERAL_REGS */  \
{0x100},/* MAC_REGS */  \
{0xfff},/* ALL_REGS */  \


Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-16 Thread Uros Bizjak via Gcc-patches
On Thu, Oct 13, 2022 at 5:33 PM Joshi, Tejas Sanjay
 wrote:
>
> [Public]
>
> Hi all,
>
> PFA, the patch that enables support for the next generation AMD Zen4 CPU via 
> -march=znver4.
> This is a basic enablement patch and as of now the costings, tunings are kept 
> same as znver3.
>
> Good for trunk?

2022-09-28  Tejas Joshi 

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_amd_cpu): Recognize znver4.
* common/config/i386/i386-common.cc (processor_names): Add znver4.
(processor_alias_table): Add znver4 and modularize old znvers.
* common/config/i386/i386-cpuinfo.h (processor_subtypes):
AMDFAM19H_ZNVER4.
* config.gcc (x86_64-*-* |...): Likewise.
* config/i386/driver-i386.cc (host_detect_local_cpu): Let
-march=native recognize znver4 cpus.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add znver4.
* config/i386/i386-options.cc (m_ZNVER4): New definition.
(m_ZNVER): Include m_ZNVER4.
(processor_cost_table): Add znver4.
* config/i386/i386.cc (ix86_reassociation_width): Likewise.
* gcc/config/i386/i386.h (processor_type): Add PROCESSOR_ZNVER4.
(PTA_ZNVER1): New definition.
(PTA_ZNVER2): Likewise.
(PTA_ZNVER3): Likewise.
(PTA_ZNVER4): Likewise.
* config/i386/i386.md (define_attr "cpu"): Add znver4.
* config/i386/x86-tune-costs.h (znver4_cost): New definition.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add znver4.
(ix86_adjust_cost): Likewise.
* config/i386/znver1.md: Add new reservations for znver4.
* doc/extend.texi: Add details about znver4.
* doc/invoke.texi: Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/funcspec-56.inc: Handle new march.
* g++.target/i386/mv29.C: Likewise.

Although I didn't check all the details of the new scheduler model,
the patch LGTM for mainline.

BTW: Perhaps znver1.md is not the right filename anymore, since it
hosts all four Zen schedulers.

Thanks,
Uros.


[PATCH] microblaze: use strverscmp() in MICROBLAZE_VERSION_COMPARE()

2022-10-16 Thread Ovidiu Panait via Gcc-patches
Currently, combining '-mxl-multiply-high' with -mcpu=v11.0 produces the
following bogus warning:

  echo "int main(){}" | ./microblazeel-linux-gnu-gcc -mxl-multiply-high \
  -mno-xl-soft-mul -mcpu=v11.0 -nostdlib -x c -
  warning: '-mxl-multiply-high' can be used only with '-mcpu=v6.00.a' or greater

Since strcasecmp() doesn't properly compare single-digit cpu versions with
double-digit versions, switch MICROBLAZE_VERSION_COMPARE() to use strverscmp()
instead.

* config/microblaze/microblaze.cc (MICROBLAZE_VERSION_COMPARE): Use
strverscmp() to fix bogus warnings when passing multi-digit -mcpu
versions on the command line.

Signed-off-by: Ovidiu Panait 
---
 gcc/config/microblaze/microblaze.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/microblaze/microblaze.cc 
b/gcc/config/microblaze/microblaze.cc
index 8fcca1829f6..28a2a9596d1 100644
--- a/gcc/config/microblaze/microblaze.cc
+++ b/gcc/config/microblaze/microblaze.cc
@@ -56,7 +56,7 @@
 /* This file should be included last.  */
 #include "target-def.h"
 
-#define MICROBLAZE_VERSION_COMPARE(VA,VB) strcasecmp (VA, VB)
+#define MICROBLAZE_VERSION_COMPARE(VA,VB) strverscmp (VA, VB)
 
 /* Classifies an address.
 
-- 
2.25.1



[PATCH] Fortran: check type of operands of logical operations, comparisons [PR107272]

2022-10-16 Thread Harald Anlauf via Gcc-patches
Dear all,

this PR is actually very related to PR107217 that addressed ICEs
with bad array constructors with typespec when used in arithmetic
expressions.  The present patch extends the checking to logical
operations and to comparisons and catches several ICE-on-invalid
as well as a few cases of accepts-invalid.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 779baf06888f3adef13c12c468c0a5ef0a45f93e Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sun, 16 Oct 2022 20:32:27 +0200
Subject: [PATCH] Fortran: check type of operands of logical operations,
 comparisons [PR107272]

gcc/fortran/ChangeLog:

	PR fortran/107272
	* arith.cc (gfc_arith_not): Operand must be of type BT_LOGICAL.
	(gfc_arith_and): Likewise.
	(gfc_arith_or): Likewise.
	(gfc_arith_eqv): Likewise.
	(gfc_arith_neqv): Likewise.
	(gfc_arith_eq): Compare consistency of types of operands.
	(gfc_arith_ne): Likewise.
	(gfc_arith_gt): Likewise.
	(gfc_arith_ge): Likewise.
	(gfc_arith_lt): Likewise.
	(gfc_arith_le): Likewise.

gcc/testsuite/ChangeLog:

	PR fortran/107272
	* gfortran.dg/pr107272.f90: New test.
---
 gcc/fortran/arith.cc   | 33 ++
 gcc/testsuite/gfortran.dg/pr107272.f90 | 21 
 2 files changed, 54 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr107272.f90

diff --git a/gcc/fortran/arith.cc b/gcc/fortran/arith.cc
index c8e882badab..fc9224ebc5c 100644
--- a/gcc/fortran/arith.cc
+++ b/gcc/fortran/arith.cc
@@ -422,6 +422,9 @@ gfc_arith_not (gfc_expr *op1, gfc_expr **resultp)
 {
   gfc_expr *result;

+  if (op1->ts.type != BT_LOGICAL)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (BT_LOGICAL, op1->ts.kind, &op1->where);
   result->value.logical = !op1->value.logical;
   *resultp = result;
@@ -435,6 +438,9 @@ gfc_arith_and (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
 {
   gfc_expr *result;

+  if (op1->ts.type != BT_LOGICAL || op2->ts.type != BT_LOGICAL)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (BT_LOGICAL, gfc_kind_max (op1, op2),
   &op1->where);
   result->value.logical = op1->value.logical && op2->value.logical;
@@ -449,6 +455,9 @@ gfc_arith_or (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
 {
   gfc_expr *result;

+  if (op1->ts.type != BT_LOGICAL || op2->ts.type != BT_LOGICAL)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (BT_LOGICAL, gfc_kind_max (op1, op2),
   &op1->where);
   result->value.logical = op1->value.logical || op2->value.logical;
@@ -463,6 +472,9 @@ gfc_arith_eqv (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
 {
   gfc_expr *result;

+  if (op1->ts.type != BT_LOGICAL || op2->ts.type != BT_LOGICAL)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (BT_LOGICAL, gfc_kind_max (op1, op2),
   &op1->where);
   result->value.logical = op1->value.logical == op2->value.logical;
@@ -477,6 +489,9 @@ gfc_arith_neqv (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
 {
   gfc_expr *result;

+  if (op1->ts.type != BT_LOGICAL || op2->ts.type != BT_LOGICAL)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (BT_LOGICAL, gfc_kind_max (op1, op2),
   &op1->where);
   result->value.logical = op1->value.logical != op2->value.logical;
@@ -1187,6 +1202,9 @@ gfc_arith_eq (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
 {
   gfc_expr *result;

+  if (op1->ts.type != op2->ts.type)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (BT_LOGICAL, gfc_default_logical_kind,
   &op1->where);
   result->value.logical = (op1->ts.type == BT_COMPLEX)
@@ -1203,6 +1221,9 @@ gfc_arith_ne (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
 {
   gfc_expr *result;

+  if (op1->ts.type != op2->ts.type)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (BT_LOGICAL, gfc_default_logical_kind,
   &op1->where);
   result->value.logical = (op1->ts.type == BT_COMPLEX)
@@ -1219,6 +1240,9 @@ gfc_arith_gt (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
 {
   gfc_expr *result;

+  if (op1->ts.type != op2->ts.type)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (BT_LOGICAL, gfc_default_logical_kind,
   &op1->where);
   result->value.logical = (gfc_compare_expr (op1, op2, INTRINSIC_GT) > 0);
@@ -1233,6 +1257,9 @@ gfc_arith_ge (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
 {
   gfc_expr *result;

+  if (op1->ts.type != op2->ts.type)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (BT_LOGICAL, gfc_default_logical_kind,
   &op1->where);
   result->value.logical = (gfc_compare_expr (op1, op2, INTRINSIC_GE) >= 0);
@@ -1247,6 +1274,9 @@ gfc_arith_lt (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
 {
   gfc_expr *result;

+  if (op1->ts.type != op2->ts.type)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (BT_LOGICAL, gfc_default_logical_kind,
   &op1->where);
   result->value.logical = (gfc_compare_expr (op1, op2, INTRINSIC_LT)

Re: [Patch] Fortran: Fixes for kind=4 characters strings [PR107266]

2022-10-16 Thread Harald Anlauf via Gcc-patches

Hi Tobias,

the patch LGTM.

Regarding testcase char4_decl-2.f90, I played a little and found that
one could in addition check the storage_size of aa, pp in the main and
compare with storage_size (4_'foo') etc.  Without your patch the
storage sizes look odd.  (Strictly speaking, a comparison like
  if (aa .ne. 4_'foo') stop 123
is not fully sufficient to catch such oddities.)

Thanks,
Harald


Am 14.10.22 um 23:18 schrieb Tobias Burnus:

Long introduction - but the patch is rather simple: Don't use kind=1
as type where kind=4 should be used.

Long introduction + background, feel free to skip.



This popped up for libgomp/testsuite/libgomp.fortran/struct-elem-map-1.f90
which uses kind=4 characters – if Sandra's "Fortran: delinearize
multi-dimensional
array accesses" patch is applied.

Patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562230.html
Used for OG11:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584716.html
On the OG12 alias devel/omp/gcc-12 vendor branch, it is used:
https://gcc.gnu.org/g:39a8c371fda6136cf77c74895a00b136409e0ba3

* * *

For mainline, I did not observe a wrong-code issue at runtime, still:

void frobc (character(kind=4)[1:*_a] * & restrict a, ...
...
static void frobc (character(kind=1) * & restrict, ...

feels odd, i.e. having the definition as kind=4 and the declaration as
kind=1.
With the patch, it becomes:

static void frobc (character(kind=4) * & restrict, character(kind=4) *
&, ...

  * * *

For the following, questionable code (→ PR107266), it is even worse:

character(kind=4) function f(x) bind(C)
   character(kind=4), value :: x
end

this gives the following, which has the wrong ABI:

character(kind=1) f (character(kind=1) x)
{
   (void) 0;
}

With the patch, it becomes:
   character(kind=4) f (character(kind=4) x)

  * * *

I think that all only exercises the trans-type.cc patch;
the trans-expr.cc code gets called – as an assert shows,
but I fail to get a dump where this goes wrong.

However, for struct-elem-map-1.f90 with mainline or with
OG12 and the patch:
   #pragma omp target map(tofrom:var.uni2[40 / 20] [len: 20])

while on OG12 without the attached patch:
   #pragma omp target map(tofrom:var.uni2[40 / 5] [len: 5])

where the problem is that TYPE_SIZE_UNIT is wrong. Whether
this only affects OG12 due to the delinearizer patch or
some code on mainline as well, I don't know.

Still, I think it should be fixed ...



OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
Registergericht München, HRB 106955




[PATCH] Don't print discriminators for -fcompare-debug.

2022-10-16 Thread Eugene Rozenfeld via Gcc-patches
With -gstatement-frontiers we may end up with different IR
coming from the front end with and without debug information turned on.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100733 for details.
That may result in differences in discriminator values and -fcompare-debug
failures.

This patch disables printing of discriminators when the dump is intended
for -fcompare-debug comparison and reverses the workaround in a test.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
PR debug/107231
PR debug/107169
* print-rtl.cc (print_rtx_operand_code_i): Don't print discriminators
for -fdebug-compare.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/pr85213.c: Reverse the workaround for 
discriminators.
---
 gcc/print-rtl.cc   | 13 ++---
 gcc/testsuite/c-c++-common/ubsan/pr85213.c |  7 +--
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc
index e115f987173..0476f3d7e79 100644
--- a/gcc/print-rtl.cc
+++ b/gcc/print-rtl.cc
@@ -453,10 +453,17 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, 
int idx)
  expanded_location xloc = insn_location (in_insn);
  fprintf (m_outfile, " \"%s\":%i:%i", xloc.file, xloc.line,
   xloc.column);
- int discriminator = insn_discriminator (in_insn);
-   if (discriminator)
- fprintf (m_outfile, " discrim %d", discriminator);
 
+ /* Don't print discriminators for -fcompare-debug since the IR
+coming from the front end may be different with and without
+debug information turned on. That may result in different
+discriminator values. */
+ if (!(dump_flags & TDF_COMPARE_DEBUG))
+   {
+ int discriminator = insn_discriminator (in_insn);
+ if (discriminator)
+   fprintf (m_outfile, " discrim %d", discriminator);
+   }
}
 #endif
 }
diff --git a/gcc/testsuite/c-c++-common/ubsan/pr85213.c 
b/gcc/testsuite/c-c++-common/ubsan/pr85213.c
index e903e976f2c..8a6be81d20f 100644
--- a/gcc/testsuite/c-c++-common/ubsan/pr85213.c
+++ b/gcc/testsuite/c-c++-common/ubsan/pr85213.c
@@ -1,11 +1,6 @@
 /* PR sanitizer/85213 */
 /* { dg-do compile } */
-/* Pass -gno-statement-frontiers to work around
-   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100733 :
-   without it the IR coming from the front end may be different with and 
without
-   debug information turned on. That may cause e.g., different discriminator 
values
-   and -fcompare-debug failures. */
-/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug 
-gno-statement-frontiers" } */
+/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug" } */
 
 int
 foo (int x)
-- 
2.25.1


Re: [PATCH] Fortran: check type of operands of logical operations, comparisons [PR107272]

2022-10-16 Thread Mikael Morin

Le 16/10/2022 à 20:46, Harald Anlauf via Fortran a écrit :

Dear all,

this PR is actually very related to PR107217 that addressed ICEs
with bad array constructors with typespec when used in arithmetic
expressions.  The present patch extends the checking to logical
operations and to comparisons and catches several ICE-on-invalid
as well as a few cases of accepts-invalid.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


Yes, thanks.


Re: [PATCH, v2] Fortran: handle bad array ctors with typespec [PR93483, , PR107216, PR107219]

2022-10-16 Thread Mikael Morin

Le 15/10/2022 à 22:15, Harald Anlauf via Fortran a écrit :

Dear all,

here is an updated version of the patch that includes suggestions
and comments by Mikael in PR93483.

Basic new features are:
- a new enum value ARITH_NOT_REDUCED to keep track if we encountered
   an expression that was not reduced via reduce_unary/reduce_binary
- a cleanup of the related checking, resulting in more readable
   code.
- a new testcase by Mikael that exhibited a flaw in the first patch
   due to a false resolution of a symbol by premature simplification.

Regtested again.  OK for mainline?


(...)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 10bb098d136..7b8f0b148bd 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -222,11 +222,12 @@ enum gfc_intrinsic_op
Assumptions are made about the numbering of the interface_op enums.  */
 #define GFC_INTRINSIC_OPS GFC_INTRINSIC_END
 
-/* Arithmetic results.  */

+/* Arithmetic results.  ARITH_NOT_REDUCED is used to keep track of failed
+   reductions because an erroneous expression was encountered.  */


The expressions are not always erroneous.  They can be, but in the 
testcase for example, all the expressions are valid.  They are just 
unsupported by the arithmetic evaluation code which works only with 
literal constants and arrays of literal constants (and arrays of arrays 
etc).


OK with that comment fixed.

Thanks.


Re: [PATCH 0/6] Add Intel Sierra Forest Instructions

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 4:36 PM Iain Sandoe  wrote:
>
>
>
> > On 14 Oct 2022, at 09:30, Hongtao Liu  wrote:
> >
> > On Fri, Oct 14, 2022 at 4:24 PM Iain Sandoe  wrote:
> >>
> >>
> >>
> >>> On 14 Oct 2022, at 09:20, Hongtao Liu  wrote:
> >>>
> >>> On Fri, Oct 14, 2022 at 4:14 PM Iain Sandoe via Gcc-patches
> >>>  wrote:
> 
>  Hi Haochen
> 
> > On 14 Oct 2022, at 08:54, Haochen Jiang via Gcc-patches 
> >  wrote:
> >
> 
> > These six patches aimed to add Intel Sierra Forest instructions, 
> > including
> > AVX-IFMA, AVX-VNNI0INT8, AVX-NE-CONVERT, CMPccXADD. We also added 
> > intrinsic
> > for vector __bf16 in this series of patch and Sierra Forest Support.
> >
> > The information is based on newly released
> > Intel Architecture Instruction Set Extensions and Future Features.
> >
> > The document comes following:
> > https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> 
>  Have you tested that the testcases work on older platforms that do not 
>  have support
>  for the new instructions in their assemblers?
> 
>  I could not see any target-requires changes in the testcases .. hence my 
>  question.
> 
> >>> Guess you are looking at compile tests?
> >>
> >> yes, compile tests would need support from the assembler.
>
> oops, not enough coffee - I’m talking rubbish here - assembler output should 
> be fine,
>
> >>> For runtime test, we have add assembler check(target-requires changed)
> >>> plus runtime check(builtin_cpu_supports)
> >>> .i.e.
> >>>
> >>> +++ b/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddhuq-2.c
> >>> @@ -0,0 +1,72 @@
> >>> +/* { dg-do run } */
> >>> +/* { dg-options "-O2 -mavxifma" } */
> >>> +/* { dg-require-effective-target avxifma } */
> >>>
> >>> Do I miss some?
> >>
> >> I would need to look at the sources after patching (perhaps they already 
> >> have
> >> suitable target-requires that did not show up in the patch).
> >>
> >> Do you have this series as a branch somewhere that I can try on one of the
> >> like affected platforms?
> >
> > Not yet.
> > Do we have any external place to put those patches so folks from the
> > community can validate before it's committed, HJ?
>
> I’d still like to be able to test if that can be done - I’ve already got a 
> large number of
> fails from new testcases in earlier additions.
I've upstream those patches to public
https://gitlab.com/x86-gcc/gcc/-/tree/users/intel/liuhongt/upstream
Also if you're intereted in Binutils patches, it's in
https://gitlab.com/x86-binutils/binutils-gdb/-/tree/users/intel/liuhongt/upstream
>
> Iain



-- 
BR,
Hongtao


Re: [PATCH 1/2] Initial Raptorlake Support

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 3:41 PM Haochen Jiang via Gcc-patches
 wrote:
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h:
> (get_intel_cpu): Handle Raptorlake.
> * common/config/i386/i386-common.cc:
> (processor_alias_table): Add Raptorlake.
Ok.
> ---
>  gcc/common/config/i386/cpuinfo.h  | 2 ++
>  gcc/common/config/i386/i386-common.cc | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index bbced8a23b9..e759e6f89fa 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -496,6 +496,8 @@ get_intel_cpu (struct __processor_model *cpu_model,
>  case 0x9a:
>  case 0xbf:
>/* Alder Lake.  */
> +case 0xb7:
> +  /* Raptor Lake.  */
>cpu = "alderlake";
>CHECK___builtin_cpu_is ("corei7");
>CHECK___builtin_cpu_is ("alderlake");
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index c0c2ad74d87..8d346245ddd 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -1929,6 +1929,8 @@ const pta processor_alias_table[] =
>  M_CPU_SUBTYPE (INTEL_COREI7_SAPPHIRERAPIDS), P_PROC_AVX512F},
>{"alderlake", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
>  M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
> +  {"raptorlake", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
> +M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
>{"bonnell", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
>  M_CPU_TYPE (INTEL_BONNELL), P_PROC_SSSE3},
>{"atom", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
> --
> 2.18.1
>


-- 
BR,
Hongtao


Re: [PATCH 2/2] Initial Meteorlake Support

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 3:41 PM Haochen Jiang via Gcc-patches
 wrote:
>
> From: "Hu, Lin1" 
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h:
> (get_intel_cpu): Handle Meteorlake.
> * common/config/i386/i386-common.cc:
> (processor_alias_table): Add Meteorlake.
Ok.
> ---
>  gcc/common/config/i386/cpuinfo.h  | 4 
>  gcc/common/config/i386/i386-common.cc | 2 ++
>  2 files changed, 6 insertions(+)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index e759e6f89fa..b5c1b21e554 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -498,6 +498,10 @@ get_intel_cpu (struct __processor_model *cpu_model,
>/* Alder Lake.  */
>  case 0xb7:
>/* Raptor Lake.  */
> +case 0xb5:
> +case 0xaa:
> +case 0xac:
> +  /* Meteor Lake.  */
>cpu = "alderlake";
>CHECK___builtin_cpu_is ("corei7");
>CHECK___builtin_cpu_is ("alderlake");
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 8d346245ddd..d6a68dc9b1d 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -1931,6 +1931,8 @@ const pta processor_alias_table[] =
>  M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
>{"raptorlake", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
>  M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
> +  {"meteorlake", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
> +M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
>{"bonnell", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
>  M_CPU_TYPE (INTEL_BONNELL), P_PROC_SSSE3},
>{"atom", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
> --
> 2.18.1
>


-- 
BR,
Hongtao


Re: [PATCH v3] Re: OpenMP: Generate SIMD clones for functions with "declare target"

2022-10-16 Thread Sandra Loosemore

On 9/30/22 04:37, Jakub Jelinek wrote:


We've discussed this at Cauldron.  Especially for this patch, but less
urgently for explicit declare simd on non-exported functions (less urgently
just because people don't mark everything declare simd usually) solving the
above is essential.  I don't say it can't be done incrementally, but if the
patch is added to trunk, it needs to be solved before 13 branches.
We need to arrange cgraph to process the declare simd clones after the
callers of the corresponding main function, so that by the time we try to
post-IPA optimize the clones we can see if they were actually used or not
and if not, throw them away.

On the other side, for the implicit declare simd (in explicit case it is
user's choice), maybe it might be useful to actually see if the function clone
is vectorizable before deciding whether to actually make use of it.
Because I doubt it will be a good optimization if we clone it, push
arguments into vectors, then because vectorization failed take it appart,
do a serial loop, create return vector from the scalar results and return.
Though, thinking more about it, for the amdgcn case maybe it is worth even
in that case if we manage to vectorize the caller.  Because if failed
vectorization on admgcn means we perform significantly slower, it can be
helpful to have even partial vectorization, vectorize statements that can
be vectorized and for others use a scalar loop.  Our vectorizer is not
prepared to do that right now I believe (which is why e.g. for
#pragma omp ordered simd we just make the whole loop non-vectorizable,
rather than using a scalar loop for stuff in there and vectorize the rest),
but with this optimization we'd effectively achieve that at least at
function call boundaries (though, only in one direction, if the caller can
be vectorized and callee can't; no optimization if caller can't and callee
could be).


My sense is that the first approach would be more straightforward than 
the second one, and I am willing to continue to work on that.  However, 
I think I need some direction to get started, as I presently know 
nothing about cgraph and I was unable to find any useful overview or 
interface documentation in the GCC internals manual.  Is this as simple 
as inserting an existing pass into the passlist to clean up after 
vectorization, or does it involve writing something more or less from 
scratch?





+  /* OpenMP directives are not permitted.  */
+CASE_GIMPLE_OMP:
+  return false;


This makes no sense.  The function is called on low GIMPLE during IPA,
there are no GOMP_* statements at this point in the IL, everything has
been expanded.  Most of OpenMP directives though end up calling
libgomp APIs which aren't pure/const and don't have declare simd
attribute...
Exception can be say master construct, or static scheduling nowait
worksharing loop.


+  /* Conservatively reject all EH-related constructs.  */
+case GIMPLE_CATCH:
+case GIMPLE_EH_FILTER:
+case GIMPLE_EH_MUST_NOT_THROW:
+case GIMPLE_EH_ELSE:
+case GIMPLE_EH_DISPATCH:
+case GIMPLE_RESX:
+case GIMPLE_TRY:


Most of these won't appear in low gimple either, I think GIMPLE_RESX
does and GIMPLE_EH_DISPATCH too, the rest probably can't.


OK, this was my bad.  I cut and pasted this from some code that was 
originally for the OMP lowering pass.  I've moved the entire 
plausibility filter to a new pass that runs just before OMP lowering. 
It seems easier to detect the things that are invalid in a cloneable 
function when they are still in a form closer to the source constructs.

+  return false;
+
+  /* Asms are not permitted since we don't know what they do.  */
+case GIMPLE_ASM:
+  return false;


What about volatile stmts?  Even volatile loads should be punted on.


That's fixed now too.



+  attr = lookup_attribute ("omp declare simd",
+  DECL_ATTRIBUTES (node->decl));
+
+  /* See if we can add an "omp declare simd" directive implicitly
+ before giving up.  */
+  /* FIXME: OpenACC "#pragma acc routine" translates into
+ "omp declare target", but appears also to have some other effects
+ that conflict with generating SIMD clones, causing ICEs.  So don't
+ do this if we've got OpenACC instead of OpenMP.  */
+  if (attr == NULL_TREE
+  && flag_openmp_target_simd_clone
+  && !oacc_get_fn_attrib (node->decl))


I admit I don't remember where exactly the simd clone happens wrt. other
IPA passes, but I think it is late pass; so, does it happen for GCN
offloading only in the lto1 offloading compiler?
Shouldn't the auto optimization be then done only in the offloading
lto1 for GCN then (say guard on targetm boolean)?


I'm afraid I don't know much about offloading, but I was under the 
impression it all goes through the same compilation process, just with a 
different target?



Otherwise, if we do it say for host offloading fallback as well
(I think it is still undesirable for PTX offloadin

Re: [PATCH 0/6] Add Intel Sierra Forest Instructions

2022-10-16 Thread Bernhard Reutner-Fischer via Gcc-patches
On 17 October 2022 03:02:22 CEST, Hongtao Liu via Gcc-patches 

>> >> Do you have this series as a branch somewhere that I can try on one of the
>> >> like affected platforms?
>> >
>> > Not yet.
>> > Do we have any external place to put those patches so folks from the
>> > community can validate before it's committed, HJ?


https://gcc.gnu.org/gitwrite.html#vendor

Not sure where in cgit the user branches are visible, though? But they can 
certainly be cloned and worked with.

HTH,


Re: [PATCH 0/6] Add Intel Sierra Forest Instructions

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 17, 2022 at 9:30 AM Bernhard Reutner-Fischer
 wrote:
>
> On 17 October 2022 03:02:22 CEST, Hongtao Liu via Gcc-patches
>
> >> >> Do you have this series as a branch somewhere that I can try on one of 
> >> >> the
> >> >> like affected platforms?
> >> >
> >> > Not yet.
> >> > Do we have any external place to put those patches so folks from the
> >> > community can validate before it's committed, HJ?
>
>
> https://gcc.gnu.org/gitwrite.html#vendor
>
> Not sure where in cgit the user branches are visible, though? But they can 
> certainly be cloned and worked with.
Thanks for the reminder, I've pushed to remotes/vendors/ix86/ise046.
 * [new ref] refs/vendors/ix86/heads/ise046 ->
vendors/ix86/ise046

>
> HTH,




--
BR,
Hongtao


[r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-16 Thread Jiang, Haochen via Gcc-patches
On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"



[r13-3212 Regression] FAIL: gcc.dg/tree-ssa/forwprop-19.c scan-tree-dump-not forwprop1 .VEC_PERM_EXPR. on Linux/x86_64

2022-10-16 Thread Jiang, Haochen via Gcc-patches
On Linux/x86_64,

b88adba751da635c6f0c353c5bc51bbe2ecf4c89 is the first bad commit
commit b88adba751da635c6f0c353c5bc51bbe2ecf4c89
Author: Liwei Xu liwei...@intel.com
Date:   Fri Sep 23 13:46:02 2022 +0800

Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

caused

FAIL: gcc.dg/tree-ssa/forwprop-19.c scan-tree-dump-not forwprop1 .VEC_PERM_EXPR.

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-3212/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m64\ -march=cascadelake}'"



[PATCH] Move scanning pass of forwprop-19.c to dse1 for r13-3212-gb88adba751da63

2022-10-16 Thread Liwei Xu via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/forwprop-19.c: Move scanning pass from forwprop1 to 
dse1, This fixs
the test case fail.
---
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c
index 4d77138b206..6ca81cb6c49 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-forwprop1" } */
+/* { dg-options "-O -fdump-tree-dse1" } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
 void f (vec *x1, vec *x2)
@@ -11,4 +11,4 @@ void f (vec *x1, vec *x2)
   *x1 = z;
 }
 
-/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "dse1" } } */
-- 
2.18.2



Re: [PATCH] Move scanning pass of forwprop-19.c to dse1 for r13-3212-gb88adba751da63

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 17, 2022 at 11:26 AM Liwei Xu via Gcc-patches
 wrote:
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/forwprop-19.c: Move scanning pass from forwprop1 to 
> dse1, This fixs
> the test case fail.
Looks like an obvious fix to me.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c
> index 4d77138b206..6ca81cb6c49 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-19.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fdump-tree-forwprop1" } */
> +/* { dg-options "-O -fdump-tree-dse1" } */
>
>  typedef int vec __attribute__((vector_size (4 * sizeof (int;
>  void f (vec *x1, vec *x2)
> @@ -11,4 +11,4 @@ void f (vec *x1, vec *x2)
>*x1 = z;
>  }
>
> -/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "dse1" } } */
> --
> 2.18.2
>


-- 
BR,
Hongtao


Re: [PATCH 2/6] Support Intel AVX-VNNI-INT8

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 3:57 PM Haochen Jiang via Gcc-patches
 wrote:
>
> From: Kong Lingling 
>
> gcc/ChangeLog
>
> * common/config/i386/cpuinfo.h (get_available_features): Detect
> avxvnniint8.
> * common/config/i386/i386-common.cc
> (OPTION_MASK_ISA2_AVXVNNIINT8_SET): New.
> (OPTION_MASK_ISA2_AVXVNNIINT8_UNSET): Ditto.
> (ix86_handle_option): Handle -mavxvnniint8.
> * common/config/i386/i386-cpuinfo.h (enum processor_features):
> Add FEATURE_AVXVNNIINT8.
> * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
> avxvnniint8.
> * config.gcc: Add avxvnniint8intrin.h.
> * config/i386/avxvnniint8intrin.h: New file.
> * config/i386/cpuid.h (bit_AVXVNNIINT8): New.
> * config/i386/i386-builtin.def: Add new builtins.
> * config/i386/i386-c.cc (ix86_target_macros_internal): Define
> __AVXVNNIINT8__.
> * config/i386/i386-options.cc (isa2_opts): Add -mavxvnniint8.
> (ix86_valid_target_attribute_inner_p): Handle avxvnniint8.
> * config/i386/i386-isa.def: Add DEF_PTA(AVXVNNIINT8) New..
> * config/i386/i386.opt: Add option -mavxvnniint8.
> * config/i386/immintrin.h: Include avxvnniint8intrin.h.
> * config/i386/sse.md
> (vpdp_): New define_insn.
> * doc/extend.texi: Document avxvnniint8.
> * doc/invoke.texi: Document -mavxvnniint8.
> * doc/sourcebuild.texi: Document target avxvnniint8.
>
> gcc/testsuite/ChangeLog
>
> * g++.dg/other/i386-2.C: Add -mavxvnniint8.
> * g++.dg/other/i386-3.C: Ditto.
> * gcc.target/i386/avx-check.h: Add avxvnniint8 check.
> * gcc.target/i386/sse-12.c: Add -mavxvnniint8.
> * gcc.target/i386/sse-13.c: Ditto.
> * gcc.target/i386/sse-14.c: Ditto.
> * gcc.target/i386/sse-22.c: Ditto.
> * gcc.target/i386/sse-23.c: Ditto.
> * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> * lib/target-supports.exp
> (check_effective_target_avxvnniint8): New.
> * gcc.target/i386/avxvnniint8-1.c: Ditto.
> * gcc.target/i386/avxvnniint8-vpdpbssd-2.c: Ditto.
> * gcc.target/i386/avxvnniint8-vpdpbssds-2.c: Ditto.
> * gcc.target/i386/avxvnniint8-vpdpbsud-2.c: Ditto.
> * gcc.target/i386/avxvnniint8-vpdpbsuds-2.c: Ditto.
> * gcc.target/i386/avxvnniint8-vpdpbuud-2.c: Ditto.
> * gcc.target/i386/avxvnniint8-vpdpbuuds-2.c: Ditto.
>
> Co-authored-by: Hongyu Wang 
> Co-authored-by: Haochen Jiang 
> ---
>  gcc/common/config/i386/cpuinfo.h  |   2 +
>  gcc/common/config/i386/i386-common.cc |  22 ++-
>  gcc/common/config/i386/i386-cpuinfo.h |   1 +
>  gcc/common/config/i386/i386-isas.h|   2 +
>  gcc/config.gcc|   2 +-
>  gcc/config/i386/avxvnniint8intrin.h   | 138 ++
>  gcc/config/i386/cpuid.h   |   1 +
>  gcc/config/i386/i386-builtin.def  |  14 ++
>  gcc/config/i386/i386-c.cc |   2 +
>  gcc/config/i386/i386-isa.def  |   1 +
>  gcc/config/i386/i386-options.cc   |   4 +-
>  gcc/config/i386/i386.opt  |   5 +
>  gcc/config/i386/immintrin.h   |   2 +
>  gcc/config/i386/sse.md|  31 
>  gcc/doc/extend.texi   |   5 +
>  gcc/doc/invoke.texi   |   9 +-
>  gcc/doc/sourcebuild.texi  |   3 +
>  gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
>  gcc/testsuite/gcc.target/i386/avx-check.h |   3 +
>  gcc/testsuite/gcc.target/i386/avxvnniint8-1.c |  43 ++
>  .../gcc.target/i386/avxvnniint8-vpdpbssd-2.c  |  72 +
>  .../gcc.target/i386/avxvnniint8-vpdpbssds-2.c |  72 +
>  .../gcc.target/i386/avxvnniint8-vpdpbsud-2.c  |  72 +
>  .../gcc.target/i386/avxvnniint8-vpdpbsuds-2.c |  72 +
>  .../gcc.target/i386/avxvnniint8-vpdpbuud-2.c  |  72 +
>  .../gcc.target/i386/avxvnniint8-vpdpbuuds-2.c |  72 +
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>  gcc/testsuite/gcc.target/i386/sse-12.c|   2 +-
>  gcc/testsuite/gcc.target/i386/sse-13.c|   2 +-
>  gcc/testsuite/gcc.target/i386/sse-14.c|   2 +-
>  gcc/testsuite/gcc.target/i386/sse-22.c|   4 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c|   2 +-
>  gcc/testsuite/lib/target-supports.exp |  12 ++
>  34 files changed, 738 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/config/i386/avxvnniint8intrin.h
>  create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssd-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssds-2.c
>  create mode 100644 gcc/testsuite/gcc.target

RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-16 Thread Eugene Rozenfeld via Gcc-patches
That commit had a bug that was fixed in 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=80f414e6d73f9f1683f93d83ce63a6a482e54bee

Was that fix included in your GCC build?

From: Jiang, Haochen 
Sent: Sunday, October 16, 2022 8:09 PM
To: gcc-patches@gcc.gnu.org; Eugene Rozenfeld ; 
Jiang, Haochen ; gcc-regress...@gcc.gnu.org
Subject: [EXTERNAL] [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

You don't often get email from 
haochen.ji...@intel.com. Learn why this is 
important
On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"



RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-16 Thread Jiang, Haochen via Gcc-patches
If that has been fixed, just ignore that mail.

It is run through by a script and got the result few days ago. However, the 
sendmail
service was down on that machine and I just noticed that issue. So I sent that 
result
manually today in case that is not fixed.

Sorry for the disturb!

BRs,
Haochen

From: Eugene Rozenfeld 
Sent: Monday, October 17, 2022 1:23 PM
To: Jiang, Haochen ; gcc-patches@gcc.gnu.org; 
gcc-regress...@gcc.gnu.org
Subject: RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

That commit had a bug that was fixed in 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=80f414e6d73f9f1683f93d83ce63a6a482e54bee

Was that fix included in your GCC build?

From: Jiang, Haochen mailto:haochen.ji...@intel.com>>
Sent: Sunday, October 16, 2022 8:09 PM
To: gcc-patches@gcc.gnu.org; Eugene Rozenfeld 
mailto:eugene.rozenf...@microsoft.com>>; Jiang, 
Haochen mailto:haochen.ji...@intel.com>>; 
gcc-regress...@gcc.gnu.org
Subject: [EXTERNAL] [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

You don't often get email from 
haochen.ji...@intel.com. Learn why this is 
important
On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"



Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 14, 2022 at 3:58 PM Haochen Jiang via Gcc-patches
 wrote:
>
> From: Kong Lingling 
>
> gcc/ChangeLog:
>
> * common/config/i386/i386-common.cc
> (OPTION_MASK_ISA2_AVXNECONVERT_SET,
> OPTION_MASK_ISA2_AVXNECONVERT_UNSET): New.
> (ix86_handle_option): Handle -mavxneconvert, unset
> avxneconvert when avx2 is disabled.
> * common/config/i386/i386-cpuinfo.h (processor_types): Add
> FEATURE_AVXNECONVERT.
> * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
> avxneconvert.
> * common/config/i386/cpuinfo.h (get_available_features):
> Detect avxneconvert.
> * config.gcc: Add avxneconvertintrin.h
> * config/i386/avxneconvertintrin.h: New.
> * config/i386/cpuid.h (bit_AVXNECONVERT): New.
> * config/i386/i386-builtin-types.def: Add
> DEF_POINTER_TYPE (PCV8HF, V8HF, CONST),
> DEF_POINTER_TYPE (PCV16HF, V16HF, CONST),
> DEF_FUNCTION_TYPE (V4SF, PCSHORT),
> DEF_FUNCTION_TYPE (V8SF, PCSHORT),
> DEF_FUNCTION_TYPE (V4SF, PCV8BF),
> DEF_FUNCTION_TYPE (V4SF, PCV8BF),
> DEF_FUNCTION_TYPE (V8SF, PCV16HF),
> DEF_FUNCTION_TYPE (V8SF, PCV16BF).
> * config/i386/i386-builtin.def: Add new builtins.
> * config/i386/i386-c.cc (ix86_target_macros_internal): Define
> __AVXNECONVERT__.
> * config/i386/i386-expand.cc (ix86_expand_special_args_builtin):
> Handle V4SF_FTYPE_PCSHORT,V8SF_FTYPE_PCSHORT,V4SF_FTYPE_PCV8BF,
> V4SF_FTYPE_PCV8HF,V8SF_FTYPE_PCV16BF,V8SF_FTYPE_PCV16HF.
> * config/i386/i386-isa.def : Add DEF_PTA(AVXNECONVERT) New.
> * config/i386/i386-options.cc (isa2_opts): Add -mavxneconvert.
> (ix86_valid_target_attribute_inner_p): Handle avxneconvert.
> * config/i386/i386.opt: Add option -mavxneconvert.
> * config/i386/immintrin.h: Inculde avxneconvertintrin.h.
> * config/i386/sse.md: (avx_vbcstne2ps_),
> (avx_vcvtne2ps_),
> (avx_vcvtne2ps_),
> (avx_vcvtneps2bf16_): New define_insn
> (avx512f_cvtneps2bf16_):Ditto.
> (avx512f_cvtneps2bf16__mask):Ditto.
> * doc/invoke.texi: Document -mavxneconvert.
> * doc/extend.texi: Document avxneconvert.
> * doc/sourcebuild.texi: Document target avxneconvert.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx-check.h: Add avxneconvert check.
> * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> * gcc.target/i386/sse-12.c: Add -mavxneconvert.
> * gcc.target/i386/sse-13.c: Ditto.
> * gcc.target/i386/sse-14.c: Ditto.
> * gcc.target/i386/sse-22.c: Ditto.
> * gcc.target/i386/sse-23.c: Ditto.
> * g++.dg/other/i386-2.C: Ditto.
> * g++.dg/other/i386-3.C: Ditto.
> * lib/target-supports.exp:add check_effective_target_avxneconvert.
> * gcc.target/i386/avx-ne-convert-1.c: New test.
> * gcc.target/i386/avx-ne-convert-vbcstnebf162ps-2.c: Ditto.
> * gcc.target/i386/avx-ne-convert-vbcstnesh2ps-2.c: Ditto.
> * gcc.target/i386/avx-ne-convert-vcvtneebf162ps-2.c: Ditto.
> * gcc.target/i386/avx-ne-convert-vcvtneeph2ps-2.c: Ditto.
> * gcc.target/i386/avx-ne-convert-vcvtneobf162ps-2.c: Ditto.
> * gcc.target/i386/avx-ne-convert-vcvtneoph2ps-2.c: Ditto.
> * gcc.target/i386/avx-ne-convert-vcvtneps2bf16-2.c: Ditto.
> * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c: Rename..
> * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1a.c: To this.
> * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c: New test.
> ---
>  gcc/common/config/i386/cpuinfo.h  |   2 +
>  gcc/common/config/i386/i386-common.cc |  21 ++-
>  gcc/common/config/i386/i386-cpuinfo.h |   1 +
>  gcc/common/config/i386/i386-isas.h|   2 +
>  gcc/config.gcc|   2 +-
>  gcc/config/i386/avxneconvertintrin.h  | 140 ++
>  gcc/config/i386/cpuid.h   |   1 +
>  gcc/config/i386/i386-builtin-types.def|  17 +++
>  gcc/config/i386/i386-builtin.def  |  18 +++
>  gcc/config/i386/i386-c.cc |   2 +
>  gcc/config/i386/i386-expand.cc|   8 +
>  gcc/config/i386/i386-isa.def  |   1 +
>  gcc/config/i386/i386-options.cc   |   4 +-
>  gcc/config/i386/i386.opt  |   5 +
>  gcc/config/i386/immintrin.h   |   4 +
>  gcc/config/i386/sse.md| 100 -
>  gcc/doc/extend.texi   |   5 +
>  gcc/doc/invoke.texi   |   9 +-
>  gcc/doc/sourcebuild.texi  |   3 +
>  gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
>  gcc/testsuite/gcc.target/i386/avx-check.h |   3 +
>  .../gcc

Re: [PATCH] middle-end IFN_ASSUME support [PR106654]

2022-10-16 Thread Martin Uecker via Gcc-patches
Am Samstag, den 15.10.2022, 10:53 +0200 schrieb Jakub Jelinek:
> On Sat, Oct 15, 2022 at 10:07:46AM +0200, Martin Uecker wrote:
> > But why? Do we really want to encourage people to
> > write such code?
> 
> Of course these ++ cases inside of expressions are just obfuscation.
> But the point is to support using predicates that can be inlined and
> simplified into something really simple the optimizers can understand.

This makes sense,.

> The paper shows as useful e.g. being able to assert something is finite:
> [[assume (std::isfinite (x)]];
> and with the recent changes on the GCC side it is now or shortly will be
> possible to take advantage of such predicates.
> It is true that
> [[assume (__builtin_isfinite (x)]];
> could work if we check TREE_SIDE_EFFECTS on the GCC side because
> it is a const function, but that is a GNU extension, so the standard
> can't count with that.  std::isfinite isn't even marked const in libstdc++
> and one can figure that out during IPA propagation only.

Hm, that already seems to work with

if (!std::isfinite(x))
  __builtin_unreachable();

https://godbolt.org/z/hj3WrEhjb


> There are many similar predicates, or user could have some that are useful
> to his program.  And either in the end it wouldn't have side-effects
> but the compiler doesn't know, or would but those side-effects would be
> unimportant to the optimizations the compiler can derive from those.

I still have the feeling that relying on something
such as the pure and const attributes might then
be a better approach for this.

>From the standards point of view, this is OK
as GCC can just set its own rules as long as it is
a subset of what the standard allows.

> As the spec defines it well what happens with the side-effects and it
> is an attribute, not a function and the languages have non-evaluated
> contexts in other places, I don't see where a user confusion could come.

The user confusion might come when somebody writes
something such as [[assume(1 == ++i)]] and I expect
that people will start doing this once this works.


But I am also a a bit worried about the slipperly slope
of exploiting this more because what "would evaluate to true"
implies in case of I/O, atomic accesses, volatile accesses 
etc.  does not seem clear to me.   But maybe I am worrying
too much.


> We don't warn for sizeof (i++) and similar either.

Which is also confusing and clang does indeed
warn about it outside of macros and I think GCC
should too.

> __builtin_assume (i++) is a bad choice because it looks like a function
> call (after all, the compilers have many similar builtins) and its argument
> looks like normal argument to the function, so it is certainly unexpected
> that the side-effects aren't evaluated.

I agree.

Best
Martin




RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-16 Thread Jiang, Haochen via Gcc-patches
Hi Rozenfeld,

I just checkout to your commit and the test still got failed.

It is reporting like this:
xgcc: error: 
/export/users2/haochenj/src/gcc/master/./libgomp/testsuite/libgomp.oacc-c++/../libgomp.oacc-c-c++-common/kernels-loop-g.c:
 '-fcompare-debug' failure (length)

Also fix a typo in manually sending, should be this to reproduce

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"

BRs,
Haochen

From: Jiang, Haochen
Sent: Monday, October 17, 2022 1:41 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org; 
gcc-regress...@gcc.gnu.org
Subject: RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

If that has been fixed, just ignore that mail.

It is run through by a script and got the result few days ago. However, the 
sendmail
service was down on that machine and I just noticed that issue. So I sent that 
result
manually today in case that is not fixed.

Sorry for the disturb!

BRs,
Haochen

From: Eugene Rozenfeld 
mailto:eugene.rozenf...@microsoft.com>>
Sent: Monday, October 17, 2022 1:23 PM
To: Jiang, Haochen mailto:haochen.ji...@intel.com>>; 
gcc-patches@gcc.gnu.org; 
gcc-regress...@gcc.gnu.org
Subject: RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

That commit had a bug that was fixed in 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=80f414e6d73f9f1683f93d83ce63a6a482e54bee

Was that fix included in your GCC build?

From: Jiang, Haochen mailto:haochen.ji...@intel.com>>
Sent: Sunday, October 16, 2022 8:09 PM
To: gcc-patches@gcc.gnu.org; Eugene Rozenfeld 
mailto:eugene.rozenf...@microsoft.com>>; Jiang, 
Haochen mailto:haochen.ji...@intel.com>>; 
gcc-regress...@gcc.gnu.org
Subject: [EXTERNAL] [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

You don't often get email from 
haochen.ji...@intel.com. Learn why this is 
important
On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"



Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-10-16 Thread Aldy Hernandez via Gcc-patches
On Thu, Oct 13, 2022 at 7:57 PM Jakub Jelinek  wrote:
>
> On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote:
> > +// Like real_arithmetic, but round the result to INF if the operation
> > +// produced inexact results.
> > +//
> > +// ?? There is still one problematic case, i387.  With
> > +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in
> > +// XFmode (long_double_type_node), so that case is OK.  But without
> > +// -mfpmath=sse, all the SF/DFmode computations are in XFmode
> > +// precision (64-bit mantissa) and only occassionally rounded to
> > +// SF/DFmode (when storing into memory from the 387 stack).  Maybe
> > +// this is ok as well though it is just occassionally more precise. ??
> > +
> > +static void
> > +frange_arithmetic (enum tree_code code, tree type,
> > +REAL_VALUE_TYPE &result,
> > +const REAL_VALUE_TYPE &op1,
> > +const REAL_VALUE_TYPE &op2,
> > +const REAL_VALUE_TYPE &inf)
> > +{
> > +  REAL_VALUE_TYPE value;
> > +  enum machine_mode mode = TYPE_MODE (type);
> > +  bool mode_composite = MODE_COMPOSITE_P (mode);
> > +
> > +  bool inexact = real_arithmetic (&value, code, &op1, &op2);
> > +  real_convert (&result, mode, &value);
> > +
> > +  // If real_convert above has rounded an inexact value to towards
> > +  // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp
> > +  // later (real_nextafter).
> > +  bool rounding = (flag_rounding_math
> > +&& (real_isneg (&inf)
> > +? real_less (&result, &value)
> > +: !real_less (&value, &result)));
>
> I thought the agreement during Cauldron was that we'd do this always,
> regardless of flag_rounding_math.
> Because excess precision (the fast one like on ia32 or -mfpmath=387 on
> x86_64), or -frounding-math, or FMA contraction can all increase precision
> and worst case it all behaves like -frounding-math for the ranges.
>
> So, perhaps use:
>   if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value)
> : !real_less (&value, &result))
>   && (inexact || !real_identical (&result, &value

Done.

> ?
> No need to do the real_isneg/real_less stuff for mode_composite, then
> we do it always for inexacts, but otherwise we check if the rounding
> performed by real.cc has been in the conservative direction (for upper
> bound to +inf, for lower bound to -inf), if yes, we don't need to do
> anything, if yes, we frange_nextafter.
>
> As discussed, for mode_composite, I think we want to do the extra
> stuff for inexact denormals and otherwise do the nextafter unconditionally,
> because our internal mode_composite representation isn't precise enough.
>
> > +  // Be extra careful if there may be discrepancies between the
> > +  // compile and runtime results.
> > +  if ((rounding || mode_composite)
> > +  && (inexact || !real_identical (&result, &value)))
> > +{
> > +  if (mode_composite)
> > + {
> > +   bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0;
>
> Use real_isdenormal here?

Done.

> Though, real_iszero needs the same thing.

So... real_isdenormal() || real_iszero() as in the attached patch?

>
> > +   if (denormal)
> > + {
> > +   REAL_VALUE_TYPE tmp;
>
> And explain here why is this, that IBM extended denormals have just
> DFmode precision.

Done.

> Though, now that I think about it, while this is correct for denormals,
>
> > +   real_convert (&tmp, DFmode, &value);
> > +   frange_nextafter (DFmode, tmp, inf);
> > +   real_convert (&result, mode, &tmp);
> > + }
>
> there are also the cases where the higher double exponent is in the
> [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [-1021, -968] or so.
> https://en.wikipedia.org/wiki/Double-precision_floating-point_format
> If the upper double is denormal in the DFmode sense, so smaller absolute
> value than __DBL_MIN__, then doing nextafter in DFmode is the right thing to
> do, the lower double must be always +/- zero.
> Now, if the result is __DBL_MIN__, the upper double is already normalized
> but we can add __DBL_DENORM_MIN__ to it, which will make the number have
> 54-bit precision.
> If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__
> and make it 55-bit precision.  Etc. until we reach __DBL_MIN__ * 2e53
> where it acts like fully normalized 106-bit precision number.
> I must say I'm not really sure what real_nextafter is doing in those cases,
> I'm afraid it doesn't handle it correctly but the only other use
> of real_nextafter is guarded with:
>   /* Don't handle composite modes, nor decimal, nor modes without
>  inf or denorm at least for now.  */
>   if (format->pnan < format->p
>   || format->b == 10
>   || !format->has_inf
>   || !format->has_denorm)
> return false;

Dunno.  Is there a conservative thing we can do for mode_composites
that aren't denor

RE: [PATCH 2/6] Support Intel AVX-VNNI-INT8

2022-10-16 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Hongtao Liu 
> Sent: Monday, October 17, 2022 12:05 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH 2/6] Support Intel AVX-VNNI-INT8
> 
> On Fri, Oct 14, 2022 at 3:57 PM Haochen Jiang via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > From: Kong Lingling 
> >
> > gcc/ChangeLog
> >
> > * common/config/i386/cpuinfo.h (get_available_features): Detect
> > avxvnniint8.
> > * common/config/i386/i386-common.cc
> > (OPTION_MASK_ISA2_AVXVNNIINT8_SET): New.
> > (OPTION_MASK_ISA2_AVXVNNIINT8_UNSET): Ditto.
> > (ix86_handle_option): Handle -mavxvnniint8.
> > * common/config/i386/i386-cpuinfo.h (enum processor_features):
> > Add FEATURE_AVXVNNIINT8.
> > * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
> > avxvnniint8.
> > * config.gcc: Add avxvnniint8intrin.h.
> > * config/i386/avxvnniint8intrin.h: New file.
> > * config/i386/cpuid.h (bit_AVXVNNIINT8): New.
> > * config/i386/i386-builtin.def: Add new builtins.
> > * config/i386/i386-c.cc (ix86_target_macros_internal): Define
> > __AVXVNNIINT8__.
> > * config/i386/i386-options.cc (isa2_opts): Add -mavxvnniint8.
> > (ix86_valid_target_attribute_inner_p): Handle avxvnniint8.
> > * config/i386/i386-isa.def: Add DEF_PTA(AVXVNNIINT8) New..
> > * config/i386/i386.opt: Add option -mavxvnniint8.
> > * config/i386/immintrin.h: Include avxvnniint8intrin.h.
> > * config/i386/sse.md
> > (vpdp_): New define_insn.
> > * doc/extend.texi: Document avxvnniint8.
> > * doc/invoke.texi: Document -mavxvnniint8.
> > * doc/sourcebuild.texi: Document target avxvnniint8.
> >
> > gcc/testsuite/ChangeLog
> >
> > * g++.dg/other/i386-2.C: Add -mavxvnniint8.
> > * g++.dg/other/i386-3.C: Ditto.
> > * gcc.target/i386/avx-check.h: Add avxvnniint8 check.
> > * gcc.target/i386/sse-12.c: Add -mavxvnniint8.
> > * gcc.target/i386/sse-13.c: Ditto.
> > * gcc.target/i386/sse-14.c: Ditto.
> > * gcc.target/i386/sse-22.c: Ditto.
> > * gcc.target/i386/sse-23.c: Ditto.
> > * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> > * lib/target-supports.exp
> > (check_effective_target_avxvnniint8): New.
> > * gcc.target/i386/avxvnniint8-1.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbssd-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbssds-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbsud-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbsuds-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbuud-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbuuds-2.c: Ditto.
> >
> > Co-authored-by: Hongyu Wang 
> > Co-authored-by: Haochen Jiang 
> > ---
> >  gcc/common/config/i386/cpuinfo.h  |   2 +
> >  gcc/common/config/i386/i386-common.cc |  22 ++-
> >  gcc/common/config/i386/i386-cpuinfo.h |   1 +
> >  gcc/common/config/i386/i386-isas.h|   2 +
> >  gcc/config.gcc|   2 +-
> >  gcc/config/i386/avxvnniint8intrin.h   | 138 ++
> >  gcc/config/i386/cpuid.h   |   1 +
> >  gcc/config/i386/i386-builtin.def  |  14 ++
> >  gcc/config/i386/i386-c.cc |   2 +
> >  gcc/config/i386/i386-isa.def  |   1 +
> >  gcc/config/i386/i386-options.cc   |   4 +-
> >  gcc/config/i386/i386.opt  |   5 +
> >  gcc/config/i386/immintrin.h   |   2 +
> >  gcc/config/i386/sse.md|  31 
> >  gcc/doc/extend.texi   |   5 +
> >  gcc/doc/invoke.texi   |   9 +-
> >  gcc/doc/sourcebuild.texi  |   3 +
> >  gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
> >  gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
> >  gcc/testsuite/gcc.target/i386/avx-check.h |   3 +
> >  gcc/testsuite/gcc.target/i386/avxvnniint8-1.c |  43 ++
> > .../gcc.target/i386/avxvnniint8-vpdpbssd-2.c  |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbssds-2.c |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbsud-2.c  |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbsuds-2.c |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbuud-2.c  |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbuuds-2.c |  72 +
> >  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
> >  gcc/testsuite/gcc.target/i386/sse-12.c|   2 +-
> >  gcc/testsuite/gcc.target/i386/sse-13.c|   2 +-
> >  gcc/testsuite/gcc.target/i386/sse-14.c|   2 +-
> >  gcc/testsuite/gcc.target/i386/sse-22.c|   4 +-
> >  gcc/testsuite/gcc.target/i386/sse-23.c|   2 +-
> >  gcc/testsuite/lib/target

Re: [PATCH 2/6] Support Intel AVX-VNNI-INT8

2022-10-16 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 17, 2022 at 2:27 PM Jiang, Haochen  wrote:
>
> > -Original Message-
> > From: Hongtao Liu 
> > Sent: Monday, October 17, 2022 12:05 PM
> > To: Jiang, Haochen 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > Subject: Re: [PATCH 2/6] Support Intel AVX-VNNI-INT8
> >
> > On Fri, Oct 14, 2022 at 3:57 PM Haochen Jiang via Gcc-patches  > patc...@gcc.gnu.org> wrote:
> > >
> > > From: Kong Lingling 
> > >
> > > gcc/ChangeLog
> > >
> > > * common/config/i386/cpuinfo.h (get_available_features): Detect
> > > avxvnniint8.
> > > * common/config/i386/i386-common.cc
> > > (OPTION_MASK_ISA2_AVXVNNIINT8_SET): New.
> > > (OPTION_MASK_ISA2_AVXVNNIINT8_UNSET): Ditto.
> > > (ix86_handle_option): Handle -mavxvnniint8.
> > > * common/config/i386/i386-cpuinfo.h (enum processor_features):
> > > Add FEATURE_AVXVNNIINT8.
> > > * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
> > > avxvnniint8.
> > > * config.gcc: Add avxvnniint8intrin.h.
> > > * config/i386/avxvnniint8intrin.h: New file.
> > > * config/i386/cpuid.h (bit_AVXVNNIINT8): New.
> > > * config/i386/i386-builtin.def: Add new builtins.
> > > * config/i386/i386-c.cc (ix86_target_macros_internal): Define
> > > __AVXVNNIINT8__.
> > > * config/i386/i386-options.cc (isa2_opts): Add -mavxvnniint8.
> > > (ix86_valid_target_attribute_inner_p): Handle avxvnniint8.
> > > * config/i386/i386-isa.def: Add DEF_PTA(AVXVNNIINT8) New..
> > > * config/i386/i386.opt: Add option -mavxvnniint8.
> > > * config/i386/immintrin.h: Include avxvnniint8intrin.h.
> > > * config/i386/sse.md
> > > (vpdp_): New define_insn.
> > > * doc/extend.texi: Document avxvnniint8.
> > > * doc/invoke.texi: Document -mavxvnniint8.
> > > * doc/sourcebuild.texi: Document target avxvnniint8.
> > >
> > > gcc/testsuite/ChangeLog
> > >
> > > * g++.dg/other/i386-2.C: Add -mavxvnniint8.
> > > * g++.dg/other/i386-3.C: Ditto.
> > > * gcc.target/i386/avx-check.h: Add avxvnniint8 check.
> > > * gcc.target/i386/sse-12.c: Add -mavxvnniint8.
> > > * gcc.target/i386/sse-13.c: Ditto.
> > > * gcc.target/i386/sse-14.c: Ditto.
> > > * gcc.target/i386/sse-22.c: Ditto.
> > > * gcc.target/i386/sse-23.c: Ditto.
> > > * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> > > * lib/target-supports.exp
> > > (check_effective_target_avxvnniint8): New.
> > > * gcc.target/i386/avxvnniint8-1.c: Ditto.
> > > * gcc.target/i386/avxvnniint8-vpdpbssd-2.c: Ditto.
> > > * gcc.target/i386/avxvnniint8-vpdpbssds-2.c: Ditto.
> > > * gcc.target/i386/avxvnniint8-vpdpbsud-2.c: Ditto.
> > > * gcc.target/i386/avxvnniint8-vpdpbsuds-2.c: Ditto.
> > > * gcc.target/i386/avxvnniint8-vpdpbuud-2.c: Ditto.
> > > * gcc.target/i386/avxvnniint8-vpdpbuuds-2.c: Ditto.
> > >
> > > Co-authored-by: Hongyu Wang 
> > > Co-authored-by: Haochen Jiang 
> > > ---
> > >  gcc/common/config/i386/cpuinfo.h  |   2 +
> > >  gcc/common/config/i386/i386-common.cc |  22 ++-
> > >  gcc/common/config/i386/i386-cpuinfo.h |   1 +
> > >  gcc/common/config/i386/i386-isas.h|   2 +
> > >  gcc/config.gcc|   2 +-
> > >  gcc/config/i386/avxvnniint8intrin.h   | 138 ++
> > >  gcc/config/i386/cpuid.h   |   1 +
> > >  gcc/config/i386/i386-builtin.def  |  14 ++
> > >  gcc/config/i386/i386-c.cc |   2 +
> > >  gcc/config/i386/i386-isa.def  |   1 +
> > >  gcc/config/i386/i386-options.cc   |   4 +-
> > >  gcc/config/i386/i386.opt  |   5 +
> > >  gcc/config/i386/immintrin.h   |   2 +
> > >  gcc/config/i386/sse.md|  31 
> > >  gcc/doc/extend.texi   |   5 +
> > >  gcc/doc/invoke.texi   |   9 +-
> > >  gcc/doc/sourcebuild.texi  |   3 +
> > >  gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
> > >  gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
> > >  gcc/testsuite/gcc.target/i386/avx-check.h |   3 +
> > >  gcc/testsuite/gcc.target/i386/avxvnniint8-1.c |  43 ++
> > > .../gcc.target/i386/avxvnniint8-vpdpbssd-2.c  |  72 +
> > > .../gcc.target/i386/avxvnniint8-vpdpbssds-2.c |  72 +
> > > .../gcc.target/i386/avxvnniint8-vpdpbsud-2.c  |  72 +
> > > .../gcc.target/i386/avxvnniint8-vpdpbsuds-2.c |  72 +
> > > .../gcc.target/i386/avxvnniint8-vpdpbuud-2.c  |  72 +
> > > .../gcc.target/i386/avxvnniint8-vpdpbuuds-2.c |  72 +
> > >  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
> > >  gcc/testsuite/gcc.target/i386/sse-12.c|   2 +-
> > >  gcc/testsuite/gcc.

Re: [PATCH] middle-end, v3: IFN_ASSUME support [PR106654]

2022-10-16 Thread Richard Biener via Gcc-patches
On Fri, 14 Oct 2022, Jakub Jelinek wrote:

> On Fri, Oct 14, 2022 at 11:27:07AM +, Richard Biener wrote:
> > > --- gcc/function.h.jj 2022-10-10 11:57:40.163722972 +0200
> > > +++ gcc/function.h2022-10-12 19:48:28.887554771 +0200
> > > @@ -438,6 +438,10 @@ struct GTY(()) function {
> > >  
> > >/* Set if there are any OMP_TARGET regions in the function.  */
> > >unsigned int has_omp_target : 1;
> > > +
> > > +  /* Set for artificial function created for [[assume (cond)]].
> > > + These should be GIMPLE optimized, but not expanded to RTL.  */
> > > +  unsigned int assume_function : 1;
> > 
> > I wonder if we should have this along force_output in the symtab
> > node and let the symtab code decide whether to expand?
> 
> I actually first had a flag on the symtab node but as the patch shows,
> when it needs to be tested, more frequently I have access to struct function
> than to cgraph node.

I see.

> > > --- gcc/gimplify.cc.jj2022-10-10 11:57:40.165722944 +0200
> > > +++ gcc/gimplify.cc   2022-10-12 19:48:28.890554730 +0200
> > > @@ -3569,7 +3569,52 @@ gimplify_call_expr (tree *expr_p, gimple
> > >fndecl, 0));
> > > return GS_OK;
> > >   }
> > > -   /* FIXME: Otherwise expand it specially.  */
> > > +   /* If not optimizing, ignore the assumptions.  */
> > > +   if (!optimize)
> > > + {
> > > +   *expr_p = NULL_TREE;
> > > +   return GS_ALL_DONE;
> > > + }
> > > +   /* Temporarily, until gimple lowering, transform
> > > +  .ASSUME (cond);
> > > +  into:
> > > +  guard = .ASSUME ();
> > > +  if (guard) goto label_true; else label_false;
> > > +  label_true:;
> > > +  {
> > > +guard = cond;
> > > +  }
> > > +  label_false:;
> > > +  .ASSUME (guard);
> > > +  such that gimple lowering can outline the condition into
> > > +  a separate function easily.  */
> > 
> > So the idea to use lambdas and/or nested functions (for OMP)
> > didn't work out or is more complicated?
> 
> Yes, that didn't work out.  Both lambda creation and nested function
> handling produce big structures with everything while for the assumptions
> it is better to have separate scalars if possible, lambda creation has
> various language imposed restrictions, diagnostics etc. and isn't
> available in C and I think the outlining in the patch is pretty simple and
> short.
> 
> > I wonder if, instead of using the above intermediate form we
> > can have a new structued GIMPLE code with sub-statements
> > 
> >  .ASSUME
> >{
> >  condition;
> >}
> 
> That is what I wrote in the patch description as alternative:
> "with the condition wrapped into a GIMPLE_BIND (I admit the above isn't   
>   
>
> extra clean but it is just something to hold it from gimplifier until 
>   
>   
> gimple low pass; it reassembles if (condition_never_true) { cond; };  
>   
>   
> an alternative would be introduce GOMP_ASSUME statement that would have   
>   
>   
> the guard var as operand and the GIMPLE_BIND as body, but for the 
>   
>   
> few passes (tree-nested and omp lowering) in between that looked like 
>   
>   
> an overkill to me)"
> I can certainly implement that easily.

I'd prefer that, it looks possibly less messy.

> > ?  There's gimple_statement_omp conveniently available as base and
> > IIRC you had the requirement to implement some OMP assume as well?
> 
> For OpenMP assumptions we right now implement just the holds clause
> of assume and implement it the same way as assume/gnu::assume attributes.
> 
> > Of ocurse a different stmt class with body would work as well here,
> > maybe we can even use a gbind with a special flag.
> > 
> > The outlining code can then be ajusted to outline a single BIND?
> 
> It already is adjusting a single bind (of course with everything nested in
> it).
> 
> > It probably won't simplify much that way.
> 
> > > +static tree
> > > +create_assumption_fn (location_t loc)
> > > +{
> > > +  tree name = clone_function_name_numbered (current_function_decl, 
> > > "_assume");
> > > +  /* For now, will be changed later.  */
> > 
> > ?
> 
> I need to create the FUNCTION_DECL early and only later on discover
> the used automatic vars (for which I need the destination function)
> and only once those are discovered I can create