Ping^3: [PATCH] warn-access: ignore template parameters when matching operator new/delete [PR109224]
Arsen Arsenović writes: > Gentle ping again. Full patch: > https://inbox.sourceware.org/gcc-patches/86y14ptvdi@aarsen.me/ And again. To clarify, the above is a v2 of sorts (it has the comment fixed, I just didn't update the subject). TIA, have a lovely day. -- Arsen Arsenović signature.asc Description: PGP signature
[PATCH] match: Fix `a != 0 ? a * b : 0` patterns for things that trap [PR116772]
For generic, `a != 0 ? a * b : 0` would match where `b` would be an expression which trap (in the case of the testcase, it was an integer division but it could be any). This fixes the issue by adding a condition for `(a != 0 ? expr : 0)` to check for expressions which have side effects or traps. PR middle-end/116772 gcc/ChangeLog: * match.pd (`a != 0 ? a / b : 0`): Add a check to make sure b does not trap or have side effects. (`a != 0 ? a * b : 0`, `a != 0 ? a & b : 0`): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr116772-1.c: New test. Signed-off-by: Andrew Pinski --- gcc/match.pd | 12 ++-- gcc/testsuite/gcc.dg/torture/pr116772-1.c | 24 +++ 2 files changed, 34 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr116772-1.c diff --git a/gcc/match.pd b/gcc/match.pd index fdb59ff0d44..db46f319c5f 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -4663,7 +4663,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (simplify (cond (ne @0 integer_zerop) (op@2 @3 @1) integer_zerop ) (if (bitwise_equal_p (@0, @3) -&& tree_expr_nonzero_p (@1)) +&& tree_expr_nonzero_p (@1) + /* Cannot make a trapping expression or with one with side + effects unconditional. */ + && !generic_expr_could_trap_p (@3) + && (GIMPLE || !TREE_SIDE_EFFECTS (@3))) @2))) /* Note we prefer the != case here @@ -4673,7 +4677,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (for op (mult bit_and) (simplify (cond (ne @0 integer_zerop) (op:c@2 @1 @3) integer_zerop) - (if (bitwise_equal_p (@0, @3)) + (if (bitwise_equal_p (@0, @3) + /* Cannot make a trapping expression or with one with side + effects unconditional. */ + && !generic_expr_could_trap_p (@1) + && (GIMPLE || !TREE_SIDE_EFFECTS (@1))) @2))) /* Simplifications of shift and rotates. */ diff --git a/gcc/testsuite/gcc.dg/torture/pr116772-1.c b/gcc/testsuite/gcc.dg/torture/pr116772-1.c new file mode 100644 index 000..eedd0398af1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr116772-1.c @@ -0,0 +1,24 @@ +/* { dg-do run } */ +/* PR middle-end/116772 */ +/* The division by `/b` should not + be made uncondtional. */ + +int mult0(int a,int b) __attribute__((noipa)); + +int mult0(int a,int b){ + return (b!=0 ? (a/b)*b : 0); +} + +int bit_and0(int a,int b) __attribute__((noipa)); + +int bit_and0(int a,int b){ + return (b!=0 ? (a/b)&b : 0); +} + +int main() { + if (mult0(3, 0) != 0) +__builtin_abort(); + if (bit_and0(3, 0) != 0) +__builtin_abort(); + return 0; +} -- 2.43.0
Re: [PATCH] libcpp, v2: Add -Wtrailing-whitespace= warning
On Thu, Sep 19, 2024 at 4:35 PM Jakub Jelinek wrote: > > On Thu, Sep 19, 2024 at 08:17:24AM +0200, Richard Biener wrote: > > On Wed, Sep 18, 2024 at 7:33 PM Jakub Jelinek wrote: > > > > > > On Wed, Sep 18, 2024 at 06:17:58PM +0100, Richard Sandiford wrote: > > > > +1 I'd much rather learn about this kind of error before the code > > > > reaches > > > > a review tool :) > > > > > > > > >From a quick check, it doesn't look like Clang has this, so there is no > > > > existing name to follow. > > > > > > I was considering also -Wtrailing-whitespace, but > > > 1) git diff really warns just about trailing spaces/tabs, not form feeds > > > or > > > vertical tabs > > > 2) gcc source contains tons of spots with form feed in it (though, > > > I think pretty much always as the sole character on a line). > > > And not really sure how people use vertical tabs in the source if at all. > > > Perhaps form feed could be not warned if at end of line if it isn't the > > > sole > > > character on a line... > > > > Generally I like diagnosing this early. For the above I'd say > > -Wtrailing-whitespace= > > with a set of things to diagnose (and a sane default - just spaces and > > tabs - for > > -Wtrailiing-whitespace) would be nice. As for naming possibly follow the > > is{space,blank,cntrl} character classifications? If those are a good > > fit, that is. > > Here is a patch which currently allows blank (' ' '\t') and space (' ' '\t' > '\f' '\v'), cntrl not yet added, not anything non-ASCII, but in theory could > be added later (though, non-ASCII would be just for inside of comments, > say non-breaking space etc. in the source is otherwise an error). > > Bootstrapped/regtested on x86_64-linux and i686-linux. > I think this is getting too complex now; I preferred the simpler version... > 2024-09-19 Jakub Jelinek > > libcpp/ > * include/cpplib.h (struct cpp_options): Add > cpp_warn_trailing_whitespace member. > (enum cpp_warning_reason): Add CPP_W_TRAILING_WHITESPACE. > * internal.h (struct _cpp_line_note): Document 'W' line note. > * lex.cc (_cpp_clean_line): Add 'W' line note for trailing whitespace > except for trailing whitespace after backslash. Formatting fix. > (_cpp_process_line_notes): Emit -Wtrailing-whitespace diagnostics. > Formatting fixes. > (lex_raw_string): Clear type on 'W' notes. > gcc/ > * doc/invoke.texi (Wtrailing-whitespace): Document. > gcc/c-family/ > * c.opt (Wtrailing-whitespace=): New option. > (Wtrailing-whitespace): New alias. > gcc/testsuite/ > * c-c++-common/cpp/Wtrailing-whitespace-1.c: New test. > * c-c++-common/cpp/Wtrailing-whitespace-2.c: New test. > * c-c++-common/cpp/Wtrailing-whitespace-3.c: New test. > * c-c++-common/cpp/Wtrailing-whitespace-4.c: New test. > * c-c++-common/cpp/Wtrailing-whitespace-5.c: New test. > > --- libcpp/include/cpplib.h.jj 2024-09-13 16:09:32.690455174 +0200 > +++ libcpp/include/cpplib.h 2024-09-19 16:59:09.674903649 +0200 > @@ -594,6 +594,9 @@ struct cpp_options >/* True if -finput-charset= option has been used explicitly. */ >bool cpp_input_charset_explicit; > > + /* -Wtrailing-whitespace= value. */ > + unsigned char cpp_warn_trailing_whitespace; > + >/* Dependency generation. */ >struct >{ > @@ -709,7 +712,8 @@ enum cpp_warning_reason { >CPP_W_EXPANSION_TO_DEFINED, >CPP_W_BIDIRECTIONAL, >CPP_W_INVALID_UTF8, > - CPP_W_UNICODE > + CPP_W_UNICODE, > + CPP_W_TRAILING_WHITESPACE > }; > > /* Callback for header lookup for HEADER, which is the name of a > --- libcpp/internal.h.jj2024-09-18 09:45:36.832570227 +0200 > +++ libcpp/internal.h 2024-09-19 16:54:56.610321817 +0200 > @@ -318,8 +318,8 @@ struct _cpp_line_note > >/* Type of note. The 9 'from' trigraph characters represent those > trigraphs, '\\' an escaped newline, ' ' an escaped newline with > - intervening space, 0 represents a note that has already been handled, > - and anything else is invalid. */ > + intervening space, 'W' trailing whitespace, 0 represents a note that > + has already been handled, and anything else is invalid. */ >unsigned int type; > }; > > --- libcpp/lex.cc.jj2024-09-13 16:09:32.720454758 +0200 > +++ libcpp/lex.cc 2024-09-19 16:58:37.434339128 +0200 > @@ -928,7 +928,7 @@ _cpp_clean_line (cpp_reader *pfile) > if (p == buffer->next_line || p[-1] != '\\') > break; > > - add_line_note (buffer, p - 1, p != d ? ' ': '\\'); > + add_line_note (buffer, p - 1, p != d ? ' ' : '\\'); > d = p - 2; > buffer->next_line = p - 1; > } > @@ -943,6 +943,20 @@ _cpp_clean_line (cpp_reader *pfile) > } > } > } > + done: > + if (d > buffer->next_line > + && CPP_OPTION (pfile, cpp_warn_trailing_whitespace)) > + switch (
Re: [PATCH] i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738]
On Thu, Sep 19, 2024 at 10:49 PM Jakub Jelinek wrote: > > Hi! > > min/max patterns for intrinsics which on x86 result in the second > input operand if the two operands are both zeros or one or both of them > are a NaN shouldn't use SMIN/SMAX RTL, because that is similarly to > MIN_EXPR/MAX_EXPR undefined what will be the result in those cases. > > The following patch adds an expander which uses either a new pattern with > UNSPEC_IEEE_M{AX,IN} or use the S{MIN,MAX} representation of the same. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > P.S. I have a patch to replace UNSPEC_IEEE_M{AX,IN} with IF_THEN_ELSE > (except for the 3dNOW! PFMIN/MAX, those actually are documented to behave > differently), but it actually doesn't improve anything much, as > simplify_const_relational_operation nor simplify_ternary_operation aren't > able to fold comparisons with two CONST_VECTOR operands or IF_THEN_ELSE > with 3 CONST_VECTOR operands. > So, maybe better approach will be to generic fold the builtins with constant > arguments (maybe leaving NaNs to runtime). I think it is still worth it to implement insn patterns with generic RTXes instead of unspecs. Maybe some future improvement to generic RTX simplification will be able to handle them. > > 2024-09-19 Uros Bizjak > Jakub Jelinek > > PR target/116738 > * config/i386/subst.md (mask_scalar_operand_arg34, > mask_scalar_expand_op3, round_saeonly_scalar_mask_arg3): New > subst attributes. > * config/i386/sse.md > (_vm3): > Change from define_insn to define_expand, rename the old define_insn > to ... > (*_vm3): > ... this. > > (_ieee_vm3): > New define_insn. > > * gcc.target/i386/sse-pr116738.c: New test. OK, also for backports. Thanks, Uros. > > --- gcc/config/i386/subst.md.jj 2024-09-18 15:49:42.200791315 +0200 > +++ gcc/config/i386/subst.md2024-09-19 12:32:51.048626421 +0200 > @@ -366,6 +366,8 @@ (define_subst_attr "mask_scalar_operand4 > (define_subst_attr "mask_scalarcz_operand4" "mask_scalarcz" "" "%{%5%}%N4") > (define_subst_attr "mask_scalar4_dest_false_dep_for_glc_cond" "mask_scalar" > "1" "operands[4] == CONST0_RTX(mode)") > (define_subst_attr "mask_scalarc_dest_false_dep_for_glc_cond" "mask_scalarc" > "1" "operands[3] == CONST0_RTX(V8HFmode)") > +(define_subst_attr "mask_scalar_operand_arg34" "mask_scalar" "" ", > operands[3], operands[4]") > +(define_subst_attr "mask_scalar_expand_op3" "mask_scalar" "3" "5") > > (define_subst "mask_scalar" >[(set (match_operand:SUBST_V 0) > @@ -473,6 +475,7 @@ (define_subst_attr "round_saeonly_scalar > (define_subst_attr "round_saeonly_scalar_constraint" "round_saeonly_scalar" > "vm" "v") > (define_subst_attr "round_saeonly_scalar_prefix" "round_saeonly_scalar" > "vex" "evex") > (define_subst_attr "round_saeonly_scalar_nimm_predicate" > "round_saeonly_scalar" "nonimmediate_operand" "register_operand") > +(define_subst_attr "round_saeonly_scalar_mask_arg3" "round_saeonly_scalar" > "" ", operands[]") > > (define_subst "round_saeonly_scalar" >[(set (match_operand:SUBST_V 0) > --- gcc/config/i386/sse.md.jj 2024-09-10 16:26:02.875151133 +0200 > +++ gcc/config/i386/sse.md 2024-09-19 12:43:31.693030695 +0200 > @@ -,7 +,27 @@ (define_insn "*ieee_3 >(const_string "*"))) > (set_attr "mode" "")]) > > -(define_insn > "_vm3" > +(define_expand > "_vm3" > + [(set (match_operand:VFH_128 0 "register_operand") > + (vec_merge:VFH_128 > + (smaxmin:VFH_128 > + (match_operand:VFH_128 1 "register_operand") > + (match_operand:VFH_128 2 "nonimmediate_operand")) > +(match_dup 1) > +(const_int 1)))] > + "TARGET_SSE" > +{ > + if (!flag_finite_math_only || flag_signed_zeros) > +{ > + emit_insn > (gen__ieee_vm3 > +(operands[0], operands[1], operands[2] > + > + )); > + DONE; > +} > +}) > + > +(define_insn > "*_vm3" >[(set (match_operand:VFH_128 0 "register_operand" "=x,v") > (vec_merge:VFH_128 > (smaxmin:VFH_128 > @@ -3348,6 +3368,25 @@ (define_insn "_vm3[(set_attr "isa" "noavx,avx") > (set_attr "type" "sse") > (set_attr "btver2_sse_attr" "maxmin") > + (set_attr "prefix" "") > + (set_attr "mode" "")]) > + > +(define_insn > "_ieee_vm3" > + [(set (match_operand:VFH_128 0 "register_operand" "=x,v") > + (vec_merge:VFH_128 > + (unspec:VFH_128 > + [(match_operand:VFH_128 1 "register_operand" "0,v") > +(match_operand:VFH_128 2 "nonimmediate_operand" > "xm,")] > + IEEE_MAXMIN) > +(match_dup 1) > +(const_int 1)))] > + "TARGET_SSE" > + "@ > + \t{%2, %0|%0, %2} > + v\t{%2, > %1, %0|%0, %1, > %2}" > + [(set_attr "isa" "noavx,avx") > + (set_attr "type" "sse") > + (set_attr "btver2_sse_attr" "maxmin") > (set_attr "prefix"
Re: [PATCH] match: Fix `a != 0 ? a * b : 0` patterns for things that trap [PR116772]
On Fri, Sep 20, 2024 at 3:07 AM Andrew Pinski wrote: > > For generic, `a != 0 ? a * b : 0` would match where `b` would be an expression > which trap (in the case of the testcase, it was an integer division but it > could be any). > > This fixes the issue by adding a condition for `(a != 0 ? expr : 0)` to check > for expressions > which have side effects or traps. I think the better fix is to restrict the patterns to GIMPLE - it doesn't look like they were moved over from fold-const.cc? Another option might be to have a way to check that @3 and @1 are "leaf" (aka non-EXPRs and non-REFERENCEs). If you think the issue could be more wide-spread and we want to preserve the folding at GENERIC (it might be useful for SCEV or niter analysis both which eventually add COND_EXPRs...), then can you, instead of repeating > + && !generic_expr_could_trap_p (@3) > + && (GIMPLE || !TREE_SIDE_EFFECTS (@3))) introduce an inline function in {gimple,generic}-match-head.cc for this, say static inline bool no_side_effects (tree t) { return !TREE_SIDE_EFFECTS (t) && !generic_expr_could_trap_p (t); } and on the GIMPLE side return true (and checking-assert we have a is_gimple_val). Thanks, Richard. > PR middle-end/116772 > > gcc/ChangeLog: > > * match.pd (`a != 0 ? a / b : 0`): Add a check to make > sure b does not trap or have side effects. > (`a != 0 ? a * b : 0`, `a != 0 ? a & b : 0`): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.dg/torture/pr116772-1.c: New test. > > Signed-off-by: Andrew Pinski > --- > gcc/match.pd | 12 ++-- > gcc/testsuite/gcc.dg/torture/pr116772-1.c | 24 +++ > 2 files changed, 34 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr116772-1.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index fdb59ff0d44..db46f319c5f 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -4663,7 +4663,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (simplify >(cond (ne @0 integer_zerop) (op@2 @3 @1) integer_zerop ) > (if (bitwise_equal_p (@0, @3) > -&& tree_expr_nonzero_p (@1)) > +&& tree_expr_nonzero_p (@1) > + /* Cannot make a trapping expression or with one with side > + effects unconditional. */ > + && !generic_expr_could_trap_p (@3) > + && (GIMPLE || !TREE_SIDE_EFFECTS (@3))) > @2))) > > /* Note we prefer the != case here > @@ -4673,7 +4677,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (for op (mult bit_and) > (simplify >(cond (ne @0 integer_zerop) (op:c@2 @1 @3) integer_zerop) > - (if (bitwise_equal_p (@0, @3)) > + (if (bitwise_equal_p (@0, @3) > + /* Cannot make a trapping expression or with one with side > + effects unconditional. */ > + && !generic_expr_could_trap_p (@1) > + && (GIMPLE || !TREE_SIDE_EFFECTS (@1))) > @2))) > > /* Simplifications of shift and rotates. */ > diff --git a/gcc/testsuite/gcc.dg/torture/pr116772-1.c > b/gcc/testsuite/gcc.dg/torture/pr116772-1.c > new file mode 100644 > index 000..eedd0398af1 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr116772-1.c > @@ -0,0 +1,24 @@ > +/* { dg-do run } */ > +/* PR middle-end/116772 */ > +/* The division by `/b` should not > + be made uncondtional. */ > + > +int mult0(int a,int b) __attribute__((noipa)); > + > +int mult0(int a,int b){ > + return (b!=0 ? (a/b)*b : 0); > +} > + > +int bit_and0(int a,int b) __attribute__((noipa)); > + > +int bit_and0(int a,int b){ > + return (b!=0 ? (a/b)&b : 0); > +} > + > +int main() { > + if (mult0(3, 0) != 0) > +__builtin_abort(); > + if (bit_and0(3, 0) != 0) > +__builtin_abort(); > + return 0; > +} > -- > 2.43.0 >
Re: [PATCH RFC] build: update bootstrap req to C++14
On Thu, Sep 19, 2024 at 4:37 PM Jakub Jelinek wrote: > > On Thu, Sep 19, 2024 at 10:21:15AM -0400, Jason Merrill wrote: > > On 9/19/24 7:57 AM, Richard Biener wrote: > > > On Wed, Sep 18, 2024 at 6:22 PM Jason Merrill wrote: > > > > > > > > Tested x86_64-pc-linux-gnu with 5.5.0 bootstrap compiler. Thoughts? > > > > > > I'm fine with this in general - do we have needs of bumping the > > > requirement for > > > GCC 15 though? IMO we should bump once we are requiring actual C++14 > > > in some place. > > > > Jakub's dwarf2asm patch yesterday uses C++14 if available, and I remember > > And libcpp too. > > > seeing a couple of other patches that would have been simpler with C++14 > > available. > > It was just a few lines and if I removed the now never true > HAVE_DESIGNATED_INITIALIZERS cases, it wouldn't even add any new lines, just > change some to others. Both of those patches were just minor optimizations, > it is fine if they don't happen during stage1. > > We also have some spots with > #if __cpp_inline_variables < 201606L > #else > #endif > conditionals but that doesn't mean we need to bump to C++17. > > Sure, bumping the required C++ version means we can remove the corresponding > conditionals, and more importantly stop worrying about working around GCC > 4.8.x/4.9 bugs (I think that is actually more important). > The price is stopping to use some of the cfarm machines for testing or > using IBM Advanced Toolchain or hand-built GCC 14 there as the system > compiler there. > At some point we certainly want to do that, the question is if the benefits > right now overweight the pain. > > > > As of the version requirement as you say only some minor versions of the > > > GCC 5 > > > series are OK I would suggest to say we recommend using GCC 6 or later > > > but GCC 5.5 should also work? > > > > Aren't we already specifying a minor revision with 4.8.3 for C++11? > > > > Another possibility would be to just say GCC 5, and adjust that upward if we > > run into problems. > > I think for the oldest supported version we need some CFarm machines around > with that compiler so that all people can actually test issues with it. > Dunno which distros shipped GCC 5 in long term support versions if any and > at which minor those are. At this point in time the relevant remaining LTS codestream at SUSE uses GCC 7 (but also has newer GCC available). The older codestream used GCC 4.8 but also has newer GCC available - being stuck with GCC 13 there though, no future updates planned. So I'm fine with raising the requirement now and documenting the oldest working release; we'd just have to double-check that really does it - for example when we document 5.4 works that might suggest people should go and download & build 5.4 while of course they should instead go and download the newest release that had the same build requirement as 5.4 had - that's why I suggested to document a _recommended_ version plus the oldest version that's known to work if readily available. Richard. > > Jakub >
Re: [PATCH] i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738]
On Fri, Sep 20, 2024 at 08:01:58AM +0200, Richard Biener wrote: > > P.S. I have a patch to replace UNSPEC_IEEE_M{AX,IN} with IF_THEN_ELSE > > (except for the 3dNOW! PFMIN/MAX, those actually are documented to behave > > differently), but it actually doesn't improve anything much, as > > simplify_const_relational_operation nor simplify_ternary_operation aren't > > able to fold comparisons with two CONST_VECTOR operands or IF_THEN_ELSE > > with 3 CONST_VECTOR operands. > > So, maybe better approach will be to generic fold the builtins with constant > > arguments (maybe leaving NaNs to runtime). > > It would be possible to fold them in the gimple folding hook to VEC_COND_EXPRs > with the chance the min/max operation being lost when expanding to RTL. Sure, but we don't actually pattern recognize typedef float v4sf __attribute__((vector_size (sizeof (4 * sizeof (float); v4sf foo (v4sf x, v4sf y) { return x < y ? y : x; } back to maxpd etc. So it wouldn't be an optimization in most cases, at least until we do that, user was looking for such insn or better with _mm_max_ps... Maybe we should. For scalar ('-Dvector_size(x)=') this is currently matched in ce2. Exception-wise, seems the insn raise Invalid on NaN input (either) and if y is SNaN, actually propagate it rather than turn it into QNaN, so I think it is actually an exact match for x < y ? y : x (or x > y ? y : x). Jakub
[PATCH] libcpp, v2: Add -Wtrailing-whitespace= warning
On Thu, Sep 19, 2024 at 08:17:24AM +0200, Richard Biener wrote: > On Wed, Sep 18, 2024 at 7:33 PM Jakub Jelinek wrote: > > > > On Wed, Sep 18, 2024 at 06:17:58PM +0100, Richard Sandiford wrote: > > > +1 I'd much rather learn about this kind of error before the code reaches > > > a review tool :) > > > > > > >From a quick check, it doesn't look like Clang has this, so there is no > > > existing name to follow. > > > > I was considering also -Wtrailing-whitespace, but > > 1) git diff really warns just about trailing spaces/tabs, not form feeds or > > vertical tabs > > 2) gcc source contains tons of spots with form feed in it (though, > > I think pretty much always as the sole character on a line). > > And not really sure how people use vertical tabs in the source if at all. > > Perhaps form feed could be not warned if at end of line if it isn't the sole > > character on a line... > > Generally I like diagnosing this early. For the above I'd say > -Wtrailing-whitespace= > with a set of things to diagnose (and a sane default - just spaces and > tabs - for > -Wtrailiing-whitespace) would be nice. As for naming possibly follow the > is{space,blank,cntrl} character classifications? If those are a good > fit, that is. Here is a patch which currently allows blank (' ' '\t') and space (' ' '\t' '\f' '\v'), cntrl not yet added, not anything non-ASCII, but in theory could be added later (though, non-ASCII would be just for inside of comments, say non-breaking space etc. in the source is otherwise an error). Bootstrapped/regtested on x86_64-linux and i686-linux. 2024-09-19 Jakub Jelinek libcpp/ * include/cpplib.h (struct cpp_options): Add cpp_warn_trailing_whitespace member. (enum cpp_warning_reason): Add CPP_W_TRAILING_WHITESPACE. * internal.h (struct _cpp_line_note): Document 'W' line note. * lex.cc (_cpp_clean_line): Add 'W' line note for trailing whitespace except for trailing whitespace after backslash. Formatting fix. (_cpp_process_line_notes): Emit -Wtrailing-whitespace diagnostics. Formatting fixes. (lex_raw_string): Clear type on 'W' notes. gcc/ * doc/invoke.texi (Wtrailing-whitespace): Document. gcc/c-family/ * c.opt (Wtrailing-whitespace=): New option. (Wtrailing-whitespace): New alias. gcc/testsuite/ * c-c++-common/cpp/Wtrailing-whitespace-1.c: New test. * c-c++-common/cpp/Wtrailing-whitespace-2.c: New test. * c-c++-common/cpp/Wtrailing-whitespace-3.c: New test. * c-c++-common/cpp/Wtrailing-whitespace-4.c: New test. * c-c++-common/cpp/Wtrailing-whitespace-5.c: New test. --- libcpp/include/cpplib.h.jj 2024-09-13 16:09:32.690455174 +0200 +++ libcpp/include/cpplib.h 2024-09-19 16:59:09.674903649 +0200 @@ -594,6 +594,9 @@ struct cpp_options /* True if -finput-charset= option has been used explicitly. */ bool cpp_input_charset_explicit; + /* -Wtrailing-whitespace= value. */ + unsigned char cpp_warn_trailing_whitespace; + /* Dependency generation. */ struct { @@ -709,7 +712,8 @@ enum cpp_warning_reason { CPP_W_EXPANSION_TO_DEFINED, CPP_W_BIDIRECTIONAL, CPP_W_INVALID_UTF8, - CPP_W_UNICODE + CPP_W_UNICODE, + CPP_W_TRAILING_WHITESPACE }; /* Callback for header lookup for HEADER, which is the name of a --- libcpp/internal.h.jj2024-09-18 09:45:36.832570227 +0200 +++ libcpp/internal.h 2024-09-19 16:54:56.610321817 +0200 @@ -318,8 +318,8 @@ struct _cpp_line_note /* Type of note. The 9 'from' trigraph characters represent those trigraphs, '\\' an escaped newline, ' ' an escaped newline with - intervening space, 0 represents a note that has already been handled, - and anything else is invalid. */ + intervening space, 'W' trailing whitespace, 0 represents a note that + has already been handled, and anything else is invalid. */ unsigned int type; }; --- libcpp/lex.cc.jj2024-09-13 16:09:32.720454758 +0200 +++ libcpp/lex.cc 2024-09-19 16:58:37.434339128 +0200 @@ -928,7 +928,7 @@ _cpp_clean_line (cpp_reader *pfile) if (p == buffer->next_line || p[-1] != '\\') break; - add_line_note (buffer, p - 1, p != d ? ' ': '\\'); + add_line_note (buffer, p - 1, p != d ? ' ' : '\\'); d = p - 2; buffer->next_line = p - 1; } @@ -943,6 +943,20 @@ _cpp_clean_line (cpp_reader *pfile) } } } + done: + if (d > buffer->next_line + && CPP_OPTION (pfile, cpp_warn_trailing_whitespace)) + switch (CPP_OPTION (pfile, cpp_warn_trailing_whitespace)) + { + case 1: + if (ISBLANK (d[-1])) + add_line_note (buffer, d - 1, 'W'); + break; + case 2: + if (IS_NVSPACE (d[-1]) && d[-1]) + add_line_note (buffer, d - 1, 'W'); + break; + } } else { @@ -955,7 +969,6
[PATCH v1 2/2] RISC-V: Add testcases for form 4 of signed scalar SAT_ADD
From: Pan Li Form 4: #define DEF_SAT_S_ADD_FMT_4(T, UT, MIN, MAX) \ T __attribute__((noinline))\ sat_s_add_##T##_fmt_4 (T x, T y) \ { \ T sum; \ bool overflow = __builtin_add_overflow (x, y, &sum); \ return !overflow ? sum : x < 0 ? MIN : MAX; \ } DEF_SAT_S_ADD_FMT_4 (int64_t, uint64_t, INT64_MIN, INT64_MAX) The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_add-13.c: New test. * gcc.target/riscv/sat_s_add-14.c: New test. * gcc.target/riscv/sat_s_add-15.c: New test. * gcc.target/riscv/sat_s_add-16.c: New test. * gcc.target/riscv/sat_s_add-run-13.c: New test. * gcc.target/riscv/sat_s_add-run-14.c: New test. * gcc.target/riscv/sat_s_add-run-15.c: New test. * gcc.target/riscv/sat_s_add-run-16.c: New test. Signed-off-by: Pan Li --- gcc/testsuite/gcc.target/riscv/sat_arith.h| 14 gcc/testsuite/gcc.target/riscv/sat_s_add-13.c | 30 + gcc/testsuite/gcc.target/riscv/sat_s_add-14.c | 32 +++ gcc/testsuite/gcc.target/riscv/sat_s_add-15.c | 31 ++ gcc/testsuite/gcc.target/riscv/sat_s_add-16.c | 29 + .../gcc.target/riscv/sat_s_add-run-13.c | 16 ++ .../gcc.target/riscv/sat_s_add-run-14.c | 16 ++ .../gcc.target/riscv/sat_s_add-run-15.c | 16 ++ .../gcc.target/riscv/sat_s_add-run-16.c | 16 ++ 9 files changed, 200 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-13.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-14.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-15.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-16.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-13.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-14.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-15.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-16.c diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h index ab141bb1779..a2617b6db70 100644 --- a/gcc/testsuite/gcc.target/riscv/sat_arith.h +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h @@ -153,6 +153,17 @@ sat_s_add_##T##_fmt_3 (T x, T y) \ #define DEF_SAT_S_ADD_FMT_3_WRAP(T, UT, MIN, MAX) \ DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) +#define DEF_SAT_S_ADD_FMT_4(T, UT, MIN, MAX) \ +T __attribute__((noinline))\ +sat_s_add_##T##_fmt_4 (T x, T y) \ +{ \ + T sum; \ + bool overflow = __builtin_add_overflow (x, y, &sum); \ + return !overflow ? sum : x < 0 ? MIN : MAX; \ +} +#define DEF_SAT_S_ADD_FMT_4_WRAP(T, UT, MIN, MAX) \ + DEF_SAT_S_ADD_FMT_4(T, UT, MIN, MAX) + #define RUN_SAT_S_ADD_FMT_1(T, x, y) sat_s_add_##T##_fmt_1(x, y) #define RUN_SAT_S_ADD_FMT_1_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_1(T, x, y) @@ -162,6 +173,9 @@ sat_s_add_##T##_fmt_3 (T x, T y) \ #define RUN_SAT_S_ADD_FMT_3(T, x, y) sat_s_add_##T##_fmt_3(x, y) #define RUN_SAT_S_ADD_FMT_3_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_3(T, x, y) +#define RUN_SAT_S_ADD_FMT_4(T, x, y) sat_s_add_##T##_fmt_4(x, y) +#define RUN_SAT_S_ADD_FMT_4_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_4(T, x, y) + /**/ /* Saturation Sub (Unsigned and Signed) */ /**/ diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_add-13.c b/gcc/testsuite/gcc.target/riscv/sat_s_add-13.c new file mode 100644 index 000..0923764cde4 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sat_s_add-13.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "sat_arith.h" + +/* +** sat_s_add_int8_t_fmt_4: +** add\s+[atx][0-9]+,\s*a0,\s*a1 +** xor\s+[atx][0-9]+,\s*a0,\s*a1 +** xor\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+ +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*7 +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*7 +** xori\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1 +** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+ +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1 +** srai\s+[atx][0-9]+,\s*[atx][0-9]+,\s*63 +** xori\s+[atx][0-9]+,\s*[atx][0-
[PATCH v1 1/2] RISC-V: Add testcases for form 3 of signed scalar SAT_ADD
From: Pan Li This patch would like to add testcases of the signed scalar SAT_ADD for form 3. Aka: Form 3: #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) \ T __attribute__((noinline))\ sat_s_add_##T##_fmt_3 (T x, T y) \ { \ T sum; \ bool overflow = __builtin_add_overflow (x, y, &sum); \ return overflow ? x < 0 ? MIN : MAX : sum; \ } DEF_SAT_S_ADD_FMT_3 (int64_t, uint64_t, INT64_MIN, INT64_MAX) The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_add-10.c: New test. * gcc.target/riscv/sat_s_add-11.c: New test. * gcc.target/riscv/sat_s_add-12.c: New test. * gcc.target/riscv/sat_s_add-9.c: New test. * gcc.target/riscv/sat_s_add-run-10.c: New test. * gcc.target/riscv/sat_s_add-run-11.c: New test. * gcc.target/riscv/sat_s_add-run-12.c: New test. * gcc.target/riscv/sat_s_add-run-9.c: New test. Signed-off-by: Pan Li --- gcc/testsuite/gcc.target/riscv/sat_arith.h| 14 gcc/testsuite/gcc.target/riscv/sat_s_add-10.c | 32 +++ gcc/testsuite/gcc.target/riscv/sat_s_add-11.c | 31 ++ gcc/testsuite/gcc.target/riscv/sat_s_add-12.c | 29 + gcc/testsuite/gcc.target/riscv/sat_s_add-9.c | 30 + .../gcc.target/riscv/sat_s_add-run-10.c | 16 ++ .../gcc.target/riscv/sat_s_add-run-11.c | 16 ++ .../gcc.target/riscv/sat_s_add-run-12.c | 16 ++ .../gcc.target/riscv/sat_s_add-run-9.c| 16 ++ 9 files changed, 200 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-12.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-9.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-12.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-9.c diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h index b4fbf5dc662..ab141bb1779 100644 --- a/gcc/testsuite/gcc.target/riscv/sat_arith.h +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h @@ -142,12 +142,26 @@ sat_s_add_##T##_fmt_2 (T x, T y) \ return x < 0 ? MIN : MAX; \ } +#define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) \ +T __attribute__((noinline))\ +sat_s_add_##T##_fmt_3 (T x, T y) \ +{ \ + T sum; \ + bool overflow = __builtin_add_overflow (x, y, &sum); \ + return overflow ? x < 0 ? MIN : MAX : sum; \ +} +#define DEF_SAT_S_ADD_FMT_3_WRAP(T, UT, MIN, MAX) \ + DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) + #define RUN_SAT_S_ADD_FMT_1(T, x, y) sat_s_add_##T##_fmt_1(x, y) #define RUN_SAT_S_ADD_FMT_1_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_1(T, x, y) #define RUN_SAT_S_ADD_FMT_2(T, x, y) sat_s_add_##T##_fmt_2(x, y) #define RUN_SAT_S_ADD_FMT_2_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_2(T, x, y) +#define RUN_SAT_S_ADD_FMT_3(T, x, y) sat_s_add_##T##_fmt_3(x, y) +#define RUN_SAT_S_ADD_FMT_3_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_3(T, x, y) + /**/ /* Saturation Sub (Unsigned and Signed) */ /**/ diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_add-10.c b/gcc/testsuite/gcc.target/riscv/sat_s_add-10.c new file mode 100644 index 000..45329619f9d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sat_s_add-10.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "sat_arith.h" + +/* +** sat_s_add_int16_t_fmt_3: +** add\s+[atx][0-9]+,\s*a0,\s*a1 +** xor\s+[atx][0-9]+,\s*a0,\s*a1 +** xor\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+ +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*15 +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*15 +** xori\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1 +** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+ +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1 +** srai\s+[atx][0-9]+,\s*[atx][0-9]+,\s*63 +** li\s+[atx][0-9]+,\s*32768 +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +*
RE: [PATCH v3 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass
> > "Kong, Lingling" writes: > > > Hi, > > > > > > This version has added a new optab named 'cfmovcc'. The new optab is > > > used in the middle end to expand to cfcmov. And simplified my patch > > > by trying to generate the conditional faulting movcc in > > > noce_try_cmove_arith > > function. > > > > > > All the changes passed bootstrap & regtest x86-64-pc-linux-gnu. > > > We also tested spec with SDE and passed the runtime test. > > > > > > Ok for trunk? > > > > > > > > > APX CFCMOV[1] feature implements conditionally faulting which means > > > If the comparison is false, all memory faults are suppressed when > > > load or store a memory operand. Now we could load or store a memory > > > operand may trap or fault for conditional move. > > > > > > In middle-end, now we don't support a conditional move if we knew > > > that a load from A or B could trap or fault. To enable CFCMOV, we > > > added a new optab named cfmovcc. > > > > > > Conditional move suppress fault for condition mem store would not > > > move any arithmetic calculations. For condition mem load now just > > > support a conditional move one trap mem and one no trap and no mem cases. > > > > Sorry if this is going over old ground (I haven't read the earlier versions > > yet), but: > > instead of adding a new optab, could we treat CFCMOV as a scalar > > instance of maskload_optab? Robin is working on adding an "else" > > value for when the condition/mask is false. After that, it would seem > > to be a pretty close match to CFCMOV. > > > > One reason for preferring maskload is that it makes the load an > > explicit part of the interface. We could then potentially use it in gimple > > too, not > just expand. > > > > Yes, for conditional load is like a scalar instance of maskload_optab with > else > operand. > I could try to use maskload_optab to generate cfcmov in rtl ifcvt pass. But > it still > after expand. > Now we don't have if-covert pass for scalar in gimple, do we have plan to do > that ? > Hi, I have tried to use maskload/maskstore to generate CFCMOV in ifcvt pass, Unlike movcc, maskload/maskstore are not allowed to FAIL. But I need restrictions for CFCMOV in backend expand. Since expand maskload/maskstore cannot fail, I can only make restrictions in ifcvt and emit_conditional_move (in optabs.cc). I'm not sure if this is the right approach, do you have any suggestions? Thanks, Lingling > > Thanks, > > Richard > > > > > > > > > > > [1].https://www.intel.com/content/www/us/en/developer/articles/techn > > > ic al/advanced-performance-extensions-apx.html > > > > > > gcc/ChangeLog: > > > > > >* doc/md.texi: Add cfmovcc insn pattern explanation. > > >* ifcvt.cc (can_use_cmove_load_mem_notrap): New func > > >for conditional faulting movcc for load. > > >(can_use_cmove_store_mem_notrap): New func for conditional > > >faulting movcc for store. > > >(can_use_cfmovcc): New func for conditional faulting. > > >(noce_try_cmove_arith): Try to convert to conditional > > > faulting > > >movcc. > > >(noce_process_if_block): Ditto. > > >* optabs.cc (emit_conditional_move): Handle cfmovcc. > > >(emit_conditional_move_1): Ditto. > > >* optabs.def (OPTAB_D): New optab. > > > --- > > > gcc/doc/md.texi | 10 > > > gcc/ifcvt.cc| 119 > > > gcc/optabs.cc | 14 +- > > > gcc/optabs.def | 1 + > > > 4 files changed, 132 insertions(+), 12 deletions(-) > > > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index > > > a9259112251..5f563787c49 100644 > > > --- a/gcc/doc/md.texi > > > +++ b/gcc/doc/md.texi > > > @@ -8591,6 +8591,16 @@ Return 1 if operand 1 is a normal floating > > > point number and 0 otherwise. @var{m} is a scalar floating point > > > mode. Operand 0 has mode @code{SImode}, and operand 1 has mode > > @var{m}. > > > +@cindex @code{cfmov@var{mode}cc} instruction pattern @item > > > +@samp{cfmov@var{mode}cc} Similar to @samp{mov@var{mode}cc} but for > > > +conditional faulting, If the comparison is false, all memory faults > > > +are suppressed when load or store a memory operand. > > > + > > > +Conditionally move operand 2 or operand 3 into operand 0 according > > > +to the comparison in operand 1. If the comparison is true, operand > > > +2 is moved into operand 0, otherwise operand 3 is moved. > > > + > > > @end table > > > @end ifset > > > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index > > > 6487574c514..59845390607 100644 > > > --- a/gcc/ifcvt.cc > > > +++ b/gcc/ifcvt.cc > > > @@ -778,6 +778,9 @@ static bool noce_try_store_flag_mask (struct > > > noce_if_info *); static rtx noce_emit_cmove (struct noce_if_info *, > > > rtx, enum > > rtx_code, rtx, > > > rtx, rtx, rtx, rtx = > > > NULL, rtx = NULL); static bool noce_tr
Re: [PATCH] i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738]
On Thu, Sep 19, 2024 at 10:50 PM Jakub Jelinek wrote: > > Hi! > > min/max patterns for intrinsics which on x86 result in the second > input operand if the two operands are both zeros or one or both of them > are a NaN shouldn't use SMIN/SMAX RTL, because that is similarly to > MIN_EXPR/MAX_EXPR undefined what will be the result in those cases. > > The following patch adds an expander which uses either a new pattern with > UNSPEC_IEEE_M{AX,IN} or use the S{MIN,MAX} representation of the same. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > P.S. I have a patch to replace UNSPEC_IEEE_M{AX,IN} with IF_THEN_ELSE > (except for the 3dNOW! PFMIN/MAX, those actually are documented to behave > differently), but it actually doesn't improve anything much, as > simplify_const_relational_operation nor simplify_ternary_operation aren't > able to fold comparisons with two CONST_VECTOR operands or IF_THEN_ELSE > with 3 CONST_VECTOR operands. > So, maybe better approach will be to generic fold the builtins with constant > arguments (maybe leaving NaNs to runtime). It would be possible to fold them in the gimple folding hook to VEC_COND_EXPRs with the chance the min/max operation being lost when expanding to RTL. Richard. > > 2024-09-19 Uros Bizjak > Jakub Jelinek > > PR target/116738 > * config/i386/subst.md (mask_scalar_operand_arg34, > mask_scalar_expand_op3, round_saeonly_scalar_mask_arg3): New > subst attributes. > * config/i386/sse.md > (_vm3): > Change from define_insn to define_expand, rename the old define_insn > to ... > (*_vm3): > ... this. > > (_ieee_vm3): > New define_insn. > > * gcc.target/i386/sse-pr116738.c: New test. > > --- gcc/config/i386/subst.md.jj 2024-09-18 15:49:42.200791315 +0200 > +++ gcc/config/i386/subst.md2024-09-19 12:32:51.048626421 +0200 > @@ -366,6 +366,8 @@ (define_subst_attr "mask_scalar_operand4 > (define_subst_attr "mask_scalarcz_operand4" "mask_scalarcz" "" "%{%5%}%N4") > (define_subst_attr "mask_scalar4_dest_false_dep_for_glc_cond" "mask_scalar" > "1" "operands[4] == CONST0_RTX(mode)") > (define_subst_attr "mask_scalarc_dest_false_dep_for_glc_cond" "mask_scalarc" > "1" "operands[3] == CONST0_RTX(V8HFmode)") > +(define_subst_attr "mask_scalar_operand_arg34" "mask_scalar" "" ", > operands[3], operands[4]") > +(define_subst_attr "mask_scalar_expand_op3" "mask_scalar" "3" "5") > > (define_subst "mask_scalar" >[(set (match_operand:SUBST_V 0) > @@ -473,6 +475,7 @@ (define_subst_attr "round_saeonly_scalar > (define_subst_attr "round_saeonly_scalar_constraint" "round_saeonly_scalar" > "vm" "v") > (define_subst_attr "round_saeonly_scalar_prefix" "round_saeonly_scalar" > "vex" "evex") > (define_subst_attr "round_saeonly_scalar_nimm_predicate" > "round_saeonly_scalar" "nonimmediate_operand" "register_operand") > +(define_subst_attr "round_saeonly_scalar_mask_arg3" "round_saeonly_scalar" > "" ", operands[]") > > (define_subst "round_saeonly_scalar" >[(set (match_operand:SUBST_V 0) > --- gcc/config/i386/sse.md.jj 2024-09-10 16:26:02.875151133 +0200 > +++ gcc/config/i386/sse.md 2024-09-19 12:43:31.693030695 +0200 > @@ -,7 +,27 @@ (define_insn "*ieee_3 >(const_string "*"))) > (set_attr "mode" "")]) > > -(define_insn > "_vm3" > +(define_expand > "_vm3" > + [(set (match_operand:VFH_128 0 "register_operand") > + (vec_merge:VFH_128 > + (smaxmin:VFH_128 > + (match_operand:VFH_128 1 "register_operand") > + (match_operand:VFH_128 2 "nonimmediate_operand")) > +(match_dup 1) > +(const_int 1)))] > + "TARGET_SSE" > +{ > + if (!flag_finite_math_only || flag_signed_zeros) > +{ > + emit_insn > (gen__ieee_vm3 > +(operands[0], operands[1], operands[2] > + > + )); > + DONE; > +} > +}) > + > +(define_insn > "*_vm3" >[(set (match_operand:VFH_128 0 "register_operand" "=x,v") > (vec_merge:VFH_128 > (smaxmin:VFH_128 > @@ -3348,6 +3368,25 @@ (define_insn "_vm3[(set_attr "isa" "noavx,avx") > (set_attr "type" "sse") > (set_attr "btver2_sse_attr" "maxmin") > + (set_attr "prefix" "") > + (set_attr "mode" "")]) > + > +(define_insn > "_ieee_vm3" > + [(set (match_operand:VFH_128 0 "register_operand" "=x,v") > + (vec_merge:VFH_128 > + (unspec:VFH_128 > + [(match_operand:VFH_128 1 "register_operand" "0,v") > +(match_operand:VFH_128 2 "nonimmediate_operand" > "xm,")] > + IEEE_MAXMIN) > +(match_dup 1) > +(const_int 1)))] > + "TARGET_SSE" > + "@ > + \t{%2, %0|%0, %2} > + v\t{%2, > %1, %0|%0, %1, > %2}" > + [(set_attr "isa" "noavx,avx") > + (set_attr "type" "sse") > + (set_attr "btver2_sse_attr" "maxmin") > (set_attr "prefix" "") > (set_attr "mode" "")]) > > --- gcc/testsuite/gcc.targ
Re: *PING* [PATCH v3 10/10] fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]
On Fri, 13 Sep 2024, Mikael Morin wrote: > *PING* > > Joseph, could you take a quick look at the handling of the new option? > > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661267.html Individual new options like this are expected to be reviewed by maintainers / reviewers for the relevant part of the compiler, not for option handling which is more for the generic machinery independent of individual options. -- Joseph S. Myers josmy...@redhat.com
[COMMITTED] testsuite: fix 'do-do' typos
Fix 'do-do' typos (should be 'dg-do'). No change in logs. gcc/testsuite/ChangeLog: * g++.dg/other/operator2.C: Fix dg-do directive. * gcc.dg/Warray-bounds-67.c: Ditto. * gcc.dg/cpp/builtin-macro-1.c: Ditto. * gcc.dg/tree-ssa/builtin-snprintf-3.c: Ditto. * obj-c++.dg/empty-private-1.mm: Ditto. --- Pushed as obvious. gcc/testsuite/g++.dg/other/operator2.C | 2 +- gcc/testsuite/gcc.dg/Warray-bounds-67.c| 2 +- gcc/testsuite/gcc.dg/cpp/builtin-macro-1.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-3.c | 8 gcc/testsuite/obj-c++.dg/empty-private-1.mm| 2 +- 5 files changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/testsuite/g++.dg/other/operator2.C b/gcc/testsuite/g++.dg/other/operator2.C index 358731127186..cd477a64c3f9 100644 --- a/gcc/testsuite/g++.dg/other/operator2.C +++ b/gcc/testsuite/g++.dg/other/operator2.C @@ -1,5 +1,5 @@ // PR c++/28852 -// { do-do compile } +// { dg-do compile } struct A { diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-67.c b/gcc/testsuite/gcc.dg/Warray-bounds-67.c index a9b9ff7d2ab2..354fb89467e3 100644 --- a/gcc/testsuite/gcc.dg/Warray-bounds-67.c +++ b/gcc/testsuite/gcc.dg/Warray-bounds-67.c @@ -2,7 +2,7 @@ of a struct that's a member of either a struct or a union. Both are obviously undefined but GCC relies on these hacks so the test verifies that -Warray-bounds doesn't trigger for it. - { do-do compile } + { dg-do compile } { dg-options "-O2 -Wall" } */ diff --git a/gcc/testsuite/gcc.dg/cpp/builtin-macro-1.c b/gcc/testsuite/gcc.dg/cpp/builtin-macro-1.c index 0f950038d1bd..6fc3c2602785 100644 --- a/gcc/testsuite/gcc.dg/cpp/builtin-macro-1.c +++ b/gcc/testsuite/gcc.dg/cpp/builtin-macro-1.c @@ -5,7 +5,7 @@ the function-like macro expansion it's part of. { dg-do run } - { do-options -no-integrated-cpp } */ + { dg-options -no-integrated-cpp } */ #include diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-3.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-3.c index e481955ab732..00ea752c1974 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-3.c @@ -1,6 +1,6 @@ /* Verify the lower and upper bounds of floating directives with precision whose range crosses zero. - { do-do compile } + { dg-do compile } { dg-options "-O2 -Wall -fdump-tree-optimized" } */ static const double x = 1.23456789; @@ -72,6 +72,6 @@ int test_g (int p) return n; } -/* { dg-final { scan-tree-dump-times "snprintf" 4 "optimized"} } - { dg-final { scan-tree-dump-not "failure_range" "optimized"} } - { dg-final { scan-tree-dump-times "verify_" 8 "optimized"} } */ +/* { dg-final { scan-tree-dump-times "snprintf" 4 "optimized" } } + { dg-final { scan-tree-dump-not "failure_range" "optimized" } } + { dg-final { scan-tree-dump-times "verify_" 8 "optimized" } } */ diff --git a/gcc/testsuite/obj-c++.dg/empty-private-1.mm b/gcc/testsuite/obj-c++.dg/empty-private-1.mm index b8b90b07ecda..0bbec921b8ec 100644 --- a/gcc/testsuite/obj-c++.dg/empty-private-1.mm +++ b/gcc/testsuite/obj-c++.dg/empty-private-1.mm @@ -1,6 +1,6 @@ /* Test for no entry after @private token. */ -/* { do-do compile } */ +/* { dg-do compile } */ @interface foo { -- 2.46.0
Re: RFC PATCH: contrib/test_summary mode for submitting testsuite results to bunsen
I'd love for (something like) gcc-testresults@ to be usefully searchable (it can be done but... lacks), so please allow me: On Fri, 13 Sep 2024, Frank Ch. Eigler wrote: > diff --git a/contrib/test_summary b/contrib/test_summary > index 5760b053ec27..867ada4d6b81 100755 > --- a/contrib/test_summary > +++ b/contrib/test_summary > @@ -39,6 +39,9 @@ if test x"$1" = "x-h"; then > should be selected from the log files. > -f: force reports to be mailed; if omitted, only reports that differ > from the sent.* version are sent. > + -b: instead of emailing, push test logs into a bunsen git repo > + -bg REPO: specify the bunsen git repo to override default > + -bt TAG: specify the bunsen git commit tag to override default > _EOF >exit 0 > fi > @@ -57,6 +60,9 @@ fi > : ${filesuffix=}; export filesuffix > : ${move=true}; export move > : ${forcemail=false}; export forcemail > +: ${bunsen=false}; > +: ${bunsengit=ssh://sourceware.org/git/bunsendb.git/}; > +: ${bunsentag=`whoami`/gcc/`uname -m`-`date +%Y%m%d-%H%M`}; That uname -m looks like it's an assumption that the report is for a 1) native build that is 2) the same machine as where the git push should happen and 3) all run the same OS. Also, my local account-name may be completely different than what's needed in the tag. Looks like there's a side-question for account names for the bunsendb when you don't have a sourceware account (are rules needed)? Anyway, please parametrize. Please instead of uname -m scrape the default target identifier from the build. Use-case: I push cross-build reports from an entirely different machine. My local login may be different. brgds, H-P
[PATCH] c++: Use type_id_in_expr_sentinel in 6 further spots in the parser
Hi! The following patch uses type_id_in_expr_sentinel in a few spots which did it all manually. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-09-19 Jakub Jelinek * parser.cc (cp_parser_postfix_expression): Use type_id_in_expr_sentinel instead of manually saving+setting/restoring parser->in_type_id_in_expr_p around cp_parser_type_id calls. (cp_parser_has_attribute_expression): Likewise. (cp_parser_cast_expression): Likewise. (cp_parser_sizeof_operand): Likewise. --- gcc/cp/parser.cc.jj 2024-09-07 09:31:20.708482757 +0200 +++ gcc/cp/parser.cc2024-09-19 10:46:21.916155154 +0200 @@ -7554,7 +7554,6 @@ cp_parser_postfix_expression (cp_parser tree type; cp_expr expression; const char *saved_message; - bool saved_in_type_id_in_expr_p; /* All of these can be handled in the same way from the point of view of parsing. Begin by consuming the token @@ -7569,11 +7568,11 @@ cp_parser_postfix_expression (cp_parser /* Look for the opening `<'. */ cp_parser_require (parser, CPP_LESS, RT_LESS); /* Parse the type to which we are casting. */ - saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p; - parser->in_type_id_in_expr_p = true; - type = cp_parser_type_id (parser, CP_PARSER_FLAGS_TYPENAME_OPTIONAL, - NULL); - parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p; + { + type_id_in_expr_sentinel s (parser); + type = cp_parser_type_id (parser, CP_PARSER_FLAGS_TYPENAME_OPTIONAL, + NULL); + } /* Look for the closing `>'. */ cp_parser_require_end_of_template_parameter_list (parser); /* Restore the old message. */ @@ -7643,7 +7642,6 @@ cp_parser_postfix_expression (cp_parser { tree type; const char *saved_message; - bool saved_in_type_id_in_expr_p; /* Consume the `typeid' token. */ cp_lexer_consume_token (parser->lexer); @@ -7658,10 +7656,10 @@ cp_parser_postfix_expression (cp_parser expression. */ cp_parser_parse_tentatively (parser); /* Try a type-id first. */ - saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p; - parser->in_type_id_in_expr_p = true; - type = cp_parser_type_id (parser); - parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p; + { + type_id_in_expr_sentinel s (parser); + type = cp_parser_type_id (parser); + } /* Look for the `)' token. Otherwise, we can't be sure that we're not looking at an expression: consider `typeid (int (3))', for example. */ @@ -7916,10 +7914,8 @@ cp_parser_postfix_expression (cp_parser else { /* Parse the type. */ - bool saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p; - parser->in_type_id_in_expr_p = true; + type_id_in_expr_sentinel s (parser); type = cp_parser_type_id (parser); - parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p; parens.require_close (parser); } @@ -9502,11 +9498,11 @@ cp_parser_has_attribute_expression (cp_p expression. */ cp_parser_parse_tentatively (parser); - bool saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p; - parser->in_type_id_in_expr_p = true; - /* Look for the type-id. */ - oper = cp_parser_type_id (parser); - parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p; + { +type_id_in_expr_sentinel s (parser); +/* Look for the type-id. */ +oper = cp_parser_type_id (parser); + } cp_parser_parse_definitely (parser); @@ -10268,15 +10264,13 @@ cp_parser_cast_expression (cp_parser *pa cp_parser_simulate_error (parser); else { - bool saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p; - parser->in_type_id_in_expr_p = true; + type_id_in_expr_sentinel s (parser); /* Look for the type-id. */ type = cp_parser_type_id (parser); /* Look for the closing `)'. */ cp_token *close_paren = parens.require_close (parser); if (close_paren) close_paren_loc = close_paren->location; - parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p; } /* Restore the saved message. */ @@ -34299,13 +34293,11 @@ cp_parser_sizeof_operand (cp_parser* par cp_parser_simulate_error (parser); else { - bool saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p; - parser->in_type_id_in_expr_p = true; + type_id_in_expr_sentinel s (parser); /* Look for the type-id. */ type = cp_parser_type_id (parser); /* Look for the closing `)'. */ parens.require_close (parser); -
[PATCH v2] MIPS: Add some floating point instructions support for MIPSr6
This patch adds some of the float point instructions from MIPS32 Release 6(mips32r6) with their respective built-in functions and tests: min_a_s, min_a_d max_a_s, max_a_d rint_s, rint_d class_s, class_d gcc/ChangeLog: * config/mips/i6400.md (i6400_fpu_minmax): Include fclass type. (i6400_fpu_fadd): Include frint type. * config/mips/mips.cc (AVAIL_NON_MIPS16): Add an entry for __builtin_mipsr6_xxx. (MIPSR6_BUILTIN_PURE): Same as above. (CODE_FOR_mipsr6_min_a_s, CODE_FOR_mipsr6_min_a_d) (CODE_FOR_mipsr6_max_a_s, CODE_FOR_mipsr6_max_a_d) (CODE_FOR_mipsr6_class_s, CODE_FOR_mipsr6_class_d): New code_aliasing macros. (mips_builtins): Add mips32r6 min_a_s, min_a_d, max_a_s, max_a_d, class_s, class_d builtins. * config/mips/mips.h (ISA_HAS_FRINT): Define a new macro. (ISA_HAS_FCLASS): Same as above. * config/mips/mips.md (UNSPEC_FRINT): New unspec. (UNSPEC_FCLASS): Same as above. (type): Add frint and fclass. (fmin_a_): Generates MINA.fmt instructions. (fmax_a_): Generates MAXA.fmt instructions. (rint2): Generates RINT.fmt instructions. (fclass_): Generates CLASS.fmt instructions. * config/mips/p6600.md (p6600_fpu_fadd): Include frint type. (p6600_fpu_fabs): Include fclass type. gcc/testsuite/ChangeLog: * gcc.target/mips/mips-class.c: New tests for MIPSr6 * gcc.target/mips/mips-minamaxa.c: Same as above. * gcc.target/mips/mips-rint.c: Same as above. Signed-off-by: Jie Mei Co-authored-by: Xi Ruoyao --- gcc/config/mips/i6400.md | 8 +-- gcc/config/mips/mips.cc | 24 + gcc/config/mips/mips.h| 4 ++ gcc/config/mips/mips.md | 52 ++- gcc/config/mips/p6600.md | 8 +-- gcc/testsuite/gcc.target/mips/mips-class.c| 17 ++ gcc/testsuite/gcc.target/mips/mips-minamaxa.c | 31 +++ gcc/testsuite/gcc.target/mips/mips-rint.c | 17 ++ 8 files changed, 151 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/mips/mips-class.c create mode 100644 gcc/testsuite/gcc.target/mips/mips-minamaxa.c create mode 100644 gcc/testsuite/gcc.target/mips/mips-rint.c diff --git a/gcc/config/mips/i6400.md b/gcc/config/mips/i6400.md index d6f691ee217..48ce980e1c2 100644 --- a/gcc/config/mips/i6400.md +++ b/gcc/config/mips/i6400.md @@ -219,16 +219,16 @@ (eq_attr "type" "fabs,fneg,fmove")) "i6400_fpu_short, i6400_fpu_apu") -;; min, max +;; min, max, fclass (define_insn_reservation "i6400_fpu_minmax" 2 (and (eq_attr "cpu" "i6400") - (eq_attr "type" "fminmax")) + (eq_attr "type" "fminmax,fclass")) "i6400_fpu_short+i6400_fpu_logic") -;; fadd, fsub, fcvt +;; fadd, fsub, fcvt, frint (define_insn_reservation "i6400_fpu_fadd" 4 (and (eq_attr "cpu" "i6400") - (eq_attr "type" "fadd,fcvt")) + (eq_attr "type" "fadd,fcvt,frint")) "i6400_fpu_long, i6400_fpu_apu") ;; fmul diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc index 173f792bf55..bf1d15b9700 100644 --- a/gcc/config/mips/mips.cc +++ b/gcc/config/mips/mips.cc @@ -15775,6 +15775,7 @@ AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && TARGET_DSPR2) AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_MMI) AVAIL_MIPS16E2_OR_NON_MIPS16 (cache, TARGET_CACHE_BUILTIN) AVAIL_NON_MIPS16 (msa, TARGET_MSA) +AVAIL_NON_MIPS16 (r6, mips_isa_rev >= 6) /* Construct a mips_builtin_description from the given arguments. @@ -15940,6 +15941,14 @@ AVAIL_NON_MIPS16 (msa, TARGET_MSA) "__builtin_msa_" #INSN, MIPS_BUILTIN_DIRECT_NO_TARGET,\ FUNCTION_TYPE, mips_builtin_avail_msa, false } +/* Define a MIPSr6 MIPS_BUILTIN_DIRECT pure function __builtin_mipsr6_ + for instruction CODE_FOR_mipsr6_. FUNCTION_TYPE is a builtin_description + field. */ +#define MIPSR6_BUILTIN_PURE(INSN, FUNCTION_TYPE) \ +{ CODE_FOR_mipsr6_ ## INSN, MIPS_FP_COND_f, \ +"__builtin_mipsr6_" #INSN, MIPS_BUILTIN_DIRECT, \ +FUNCTION_TYPE, mips_builtin_avail_r6, true } + #define CODE_FOR_mips_sqrt_ps CODE_FOR_sqrtv2sf2 #define CODE_FOR_mips_addq_ph CODE_FOR_addv2hi3 #define CODE_FOR_mips_addu_qb CODE_FOR_addv4qi3 @@ -16177,6 +16186,13 @@ AVAIL_NON_MIPS16 (msa, TARGET_MSA) #define CODE_FOR_msa_ldi_w CODE_FOR_msa_ldiv4si #define CODE_FOR_msa_ldi_d CODE_FOR_msa_ldiv2di +#define CODE_FOR_mipsr6_min_a_s CODE_FOR_fmin_a_sf +#define CODE_FOR_mipsr6_min_a_d CODE_FOR_fmin_a_df +#define CODE_FOR_mipsr6_max_a_s CODE_FOR_fmax_a_sf +#define CODE_FOR_mipsr6_max_a_d CODE_FOR_fmax_a_df +#define CODE_FOR_mipsr6_class_s CODE_FOR_fclass_sf +#define CODE_FOR_mipsr6_class_d CODE_FOR_fclass_df + static const struct mips_builtin_description mips_builtins[] = { #define MIPS_GET_FCSR 0
GCC 15: nvptx '-mptx=3.1' multilib variants are deprecated
Hi! Regarding ongoing maintenance efforts, and avoiding to build multilib variants that probably nobody uses apart from a few of us testing these out of routine (via building/linking with explicit '-mptx=3.1'), I propose: "GCC 15: nvptx '-mptx=3.1' multilib variants are deprecated", see attached, "[...], and will be removed in GCC 16". Any objections? If not, then I'll push this before the GCC 15 release, and timely after the GCC 15 release apply the corresponding code changes (yet to be implemented). (That is, no actual change for GCC release users for another 1.5 years.) These '-mptx=3.1' multilib variants are only useful for users of ancient CUDA/Nvidia Driver, which doesn't support GCC's default PTX ISA 6.0 multilib variants; PTX ISA 6.0 is supported as of CUDA 9, 2017-09. Grüße Thomas >From 8c099b2c4fed4f0745ef913c865868e76c061232 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Thu, 19 Sep 2024 22:04:28 +0200 Subject: [PATCH] GCC 15: nvptx '-mptx=3.1' multilib variants are deprecated --- htdocs/gcc-15/changes.html | 4 1 file changed, 4 insertions(+) diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index 7c372688..99242d2c 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.html @@ -191,6 +191,10 @@ a work-in-progress. For this, a recent version of https://gcc.gnu.org/install/specific.html#nvptx-x-none"; >nvptx-tools is required. + +The -mptx=3.1 multilib variants are deprecated and will be +removed in GCC 16. + -- 2.45.2
Re: Re: *PING* [PATCH v3 10/10] fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]
On Mon, Sep 16, 2024 at 10:52:43AM +0200, Mikael Morin wrote: > > While I understand the intent of 'positive form' vs 'negative form', the > > above might be clearer as > > > > Usage of intrinsics can be implemented either by generating a call > > to the libgfortran library function or by directly generating inline > > code. For most intrinsics, only a single variant is available, and > > there is no choice of implementation. However, some intrinsics can > > use a library function or inline code, wher inline code typically offers > > opportunities for additional optimization over a library function. > > With @code{-finline-intrinsics=...} or > > @code{-fno-inline-intrinsics=...}, the > > choice applies only to the intrinsics present in the comma-separated > > list > > provided as argument. > > > > > > +For each intrinsic, if no choice of implementation was made through > > > > either of > > > > +the flag variants, a default behaviour is chosen depending on > > > > optimization: > > > > +library calls are generated when not optimizing or when optimizing for > > > > size; > > > > +otherwise inline code is preferred. > > > > + > > > > > > OK with consideration the above comments. > > > > Harald actually gave a partial green light on this already, but obviously > there was still room for improvement. > Thanks for the review, I'm incorporating the changes you suggested. > > I was (and still am) waiting for a review from someone knowledgeable in the > options system. I'm considering proceeding without, as I prefer seeing this > pushed sooner than later. Just note lang.opt.urls will need to be updated, either you do it right away with make regenerate-opt-urls or commit, wait for a nag-mail from CI and commit incrementally the patch it creates. Jakub
[PATCH 1/2] c++: Don't strip USING_DECLs when updating local bindings [PR116748]
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? Alternatively I could solve this the other way around (have a new 'old_target = strip_using_decl (old)' and replace all usages of 'old' except the usages in this patch); this is more churn but probably better matches how other functions are structured. -- >8 -- Currently update_binding strips USING_DECLs too eagerly, leading to ICEs in pop_local_decl as it can't find the decl it's popping in the binding list. Let's rather try to keep the original USING_DECL around. This also means that using59.C can point to the location of the using-decl rather than the underlying object directly; this is in the direction required to fix PR c++/106851 (though more work is needed to emit properly helpful diagnostics here). PR c++/116748 gcc/cp/ChangeLog: * name-lookup.cc (update_binding): Maintain USING_DECLs in the binding slots. gcc/testsuite/ChangeLog: * g++.dg/lookup/using59.C: Update location. * g++.dg/lookup/using69.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/name-lookup.cc | 12 +++- gcc/testsuite/g++.dg/lookup/using59.C | 4 ++-- gcc/testsuite/g++.dg/lookup/using69.C | 10 ++ 3 files changed, 19 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/g++.dg/lookup/using69.C diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc index c7a693e02d5..94b031e6be2 100644 --- a/gcc/cp/name-lookup.cc +++ b/gcc/cp/name-lookup.cc @@ -3005,6 +3005,8 @@ update_binding (cp_binding_level *level, cxx_binding *binding, tree *slot, if (old == error_mark_node) old = NULL_TREE; + + tree old_bval = old; old = strip_using_decl (old); if (DECL_IMPLICIT_TYPEDEF_P (decl)) @@ -3021,7 +3023,7 @@ update_binding (cp_binding_level *level, cxx_binding *binding, tree *slot, gcc_checking_assert (!to_type); hide_type = hiding; to_type = decl; - to_val = old; + to_val = old_bval; } else hide_value = hiding; @@ -3034,7 +3036,7 @@ update_binding (cp_binding_level *level, cxx_binding *binding, tree *slot, /* OLD is an implicit typedef. Move it to to_type. */ gcc_checking_assert (!to_type); - to_type = old; + to_type = old_bval; hide_type = hide_value; old = NULL_TREE; hide_value = false; @@ -3093,7 +3095,7 @@ update_binding (cp_binding_level *level, cxx_binding *binding, tree *slot, { if (same_type_p (TREE_TYPE (old), TREE_TYPE (decl))) /* Two type decls to the same type. Do nothing. */ - return old; + return old_bval; else goto conflict; } @@ -3106,7 +3108,7 @@ update_binding (cp_binding_level *level, cxx_binding *binding, tree *slot, /* The new one must be an alias at this point. */ gcc_assert (DECL_NAMESPACE_ALIAS (decl)); - return old; + return old_bval; } else if (TREE_CODE (old) == VAR_DECL) { @@ -3121,7 +3123,7 @@ update_binding (cp_binding_level *level, cxx_binding *binding, tree *slot, else { conflict: - diagnose_name_conflict (decl, old); + diagnose_name_conflict (decl, old_bval); to_val = NULL_TREE; } } diff --git a/gcc/testsuite/g++.dg/lookup/using59.C b/gcc/testsuite/g++.dg/lookup/using59.C index 3c3a73c28d5..b7ec325d234 100644 --- a/gcc/testsuite/g++.dg/lookup/using59.C +++ b/gcc/testsuite/g++.dg/lookup/using59.C @@ -1,10 +1,10 @@ namespace Y { - extern int I; // { dg-message "previous declaration" } + extern int I; } -using Y::I; +using Y::I; // { dg-message "previous declaration" } extern int I; // { dg-error "conflicts with a previous" } extern int J; diff --git a/gcc/testsuite/g++.dg/lookup/using69.C b/gcc/testsuite/g++.dg/lookup/using69.C new file mode 100644 index 000..7d52b73b9ce --- /dev/null +++ b/gcc/testsuite/g++.dg/lookup/using69.C @@ -0,0 +1,10 @@ +// PR c++/116748 + +namespace ns { + struct empty; +} + +void foo() { + using ns::empty; + int empty; +} -- 2.46.0
[PATCH 2/2] c++: Implement resolution for DR 36 [PR116160]
Noticed how to fix this while working on the other patch. Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? -- >8 -- This implements part of P1787 to no longer complain about redeclaring an entity via using-decl other than in a class scope. PR c++/116160 gcc/cp/ChangeLog: * name-lookup.cc (supplement_binding): Allow redeclaration via USING_DECL if not in class scope. (do_nonmember_using_decl): Remove function-scope exemption. (push_using_decl_bindings): Remove outdated comment. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/using-enum-3.C: No longer expect an error. * g++.dg/lookup/using53.C: Remove XFAIL. * g++.dg/cpp2a/using-enum-11.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/name-lookup.cc | 12 +++- gcc/testsuite/g++.dg/cpp0x/using-enum-3.C | 2 +- gcc/testsuite/g++.dg/cpp2a/using-enum-11.C | 9 + gcc/testsuite/g++.dg/lookup/using53.C | 2 +- 4 files changed, 18 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/using-enum-11.C diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc index 94b031e6be2..22a1c6aac8c 100644 --- a/gcc/cp/name-lookup.cc +++ b/gcc/cp/name-lookup.cc @@ -2874,6 +2874,12 @@ supplement_binding (cxx_binding *binding, tree decl) "%<-std=c++2c%> or %<-std=gnu++2c%>"); binding->value = name_lookup::ambiguous (decl, binding->value); } + else if (binding->scope->kind != sk_class + && TREE_CODE (decl) == USING_DECL + && decls_match (target_bval, target_decl)) +/* Since P1787 (DR 36) it is OK to redeclare entities via using-decl, + except in class scopes. */ +ok = false; else { if (!error_operand_p (bval)) @@ -5375,8 +5381,7 @@ do_nonmember_using_decl (name_lookup &lookup, bool fn_scope_p, else if (value /* Ignore anticipated builtins. */ && !anticipated_builtin_p (value) - && (fn_scope_p - || !decls_match (lookup.value, strip_using_decl (value + && !decls_match (lookup.value, strip_using_decl (value))) { diagnose_name_conflict (lookup.value, value); failed = true; @@ -6648,9 +6653,6 @@ push_using_decl_bindings (name_lookup *lookup, tree name, tree value) type = binding->type; } - /* DR 36 questions why using-decls at function scope may not be - duplicates. Disallow it, as C++11 claimed and PR 20420 - implemented. */ if (lookup) do_nonmember_using_decl (*lookup, true, true, &value, &type); diff --git a/gcc/testsuite/g++.dg/cpp0x/using-enum-3.C b/gcc/testsuite/g++.dg/cpp0x/using-enum-3.C index 34f8bf4fa0b..4638181c63c 100644 --- a/gcc/testsuite/g++.dg/cpp0x/using-enum-3.C +++ b/gcc/testsuite/g++.dg/cpp0x/using-enum-3.C @@ -9,7 +9,7 @@ void f () { enum e { a }; - using e::a; // { dg-error "redeclaration" } + using e::a; // { dg-bogus "redeclaration" "P1787" } // { dg-error "enum" "" { target { ! c++2a } } .-1 } } diff --git a/gcc/testsuite/g++.dg/cpp2a/using-enum-11.C b/gcc/testsuite/g++.dg/cpp2a/using-enum-11.C new file mode 100644 index 000..ff99ed422d5 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/using-enum-11.C @@ -0,0 +1,9 @@ +// PR c++/116160 +// { dg-do compile { target c++20 } } + +enum class Blah { b }; +void foo() { + using Blah::b; + using Blah::b; + using enum Blah; +} diff --git a/gcc/testsuite/g++.dg/lookup/using53.C b/gcc/testsuite/g++.dg/lookup/using53.C index e91829e939a..8279c73bfc4 100644 --- a/gcc/testsuite/g++.dg/lookup/using53.C +++ b/gcc/testsuite/g++.dg/lookup/using53.C @@ -52,5 +52,5 @@ void f () { using N::i; - using N::i; // { dg-bogus "conflicts" "See P1787 (CWG36)" { xfail *-*-* } } + using N::i; // { dg-bogus "conflicts" "See P1787 (CWG36)" } } -- 2.46.0
Re: [PATCH] Remove PHI_RESULT_PTR and change some PHI_RESULT to be gimple_phi_result [PR116643]
> Am 20.09.2024 um 06:02 schrieb Andrew Pinski : > > There was only a few uses PHI_RESULT_PTR so lets remove it and use > gimple_phi_result_ptr > or gimple_phi_result directly instead. > Since I was modifying ssa-iterators.h for the use of PHI_RESULT_PTR, change > the use > of PHI_RESULT there to be gimple_phi_result instead. > > This also removes one extra indirection that was done for PHI_RESULT so > stage2 building > should be slightly faster. > > Bootstrapped and tested on x86_64-linux-gnu. Ok Richard >PR middle-end/116643 > > gcc/ChangeLog: > >* ssa-iterators.h (single_phi_def): Use gimple_phi_result >instead of PHI_RESULT. >(op_iter_init_phidef): Use gimple_phi_result/gimple_phi_result_ptr >instead of PHI_RESULT/PHI_RESULT_PTR. >* tree-ssa-operands.h (PHI_RESULT_PTR): Remove. >(PHI_RESULT): Use gimple_phi_result directly. >(SET_PHI_RESULT): Use gimple_phi_result_ptr directly. > > Signed-off-by: Andrew Pinski > --- > gcc/ssa-iterators.h | 6 +++--- > gcc/tree-ssa-operands.h | 5 ++--- > 2 files changed, 5 insertions(+), 6 deletions(-) > > diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h > index b7b01fd018a..e0e555cc472 100644 > --- a/gcc/ssa-iterators.h > +++ b/gcc/ssa-iterators.h > @@ -768,7 +768,7 @@ num_ssa_operands (gimple *stmt, int flags) > inline tree > single_phi_def (gphi *stmt, int flags) > { > - tree def = PHI_RESULT (stmt); > + tree def = gimple_phi_result (stmt); > if ((flags & SSA_OP_DEF) && is_gimple_reg (def)) > return def; > if ((flags & SSA_OP_VIRTUAL_DEFS) && !is_gimple_reg (def)) > @@ -811,7 +811,7 @@ op_iter_init_phiuse (ssa_op_iter *ptr, gphi *phi, int > flags) > inline def_operand_p > op_iter_init_phidef (ssa_op_iter *ptr, gphi *phi, int flags) > { > - tree phi_def = PHI_RESULT (phi); > + tree phi_def = gimple_phi_result (phi); > int comp; > > clear_and_done_ssa_iter (ptr); > @@ -833,7 +833,7 @@ op_iter_init_phidef (ssa_op_iter *ptr, gphi *phi, int > flags) > /* The first call to op_iter_next_def will terminate the iterator since > all the fields are NULL. Simply return the result here as the first and > therefore only result. */ > - return PHI_RESULT_PTR (phi); > + return gimple_phi_result_ptr (phi); > } > > /* Return true is IMM has reached the end of the immediate use stmt list. */ > diff --git a/gcc/tree-ssa-operands.h b/gcc/tree-ssa-operands.h > index 8072932564a..b6534f18c66 100644 > --- a/gcc/tree-ssa-operands.h > +++ b/gcc/tree-ssa-operands.h > @@ -72,9 +72,8 @@ struct GTY(()) ssa_operands { > #define USE_OP_PTR(OP)(&((OP)->use_ptr)) > #define USE_OP(OP)(USE_FROM_PTR (USE_OP_PTR (OP))) > > -#define PHI_RESULT_PTR(PHI)gimple_phi_result_ptr (PHI) > -#define PHI_RESULT(PHI)DEF_FROM_PTR (PHI_RESULT_PTR (PHI)) > -#define SET_PHI_RESULT(PHI, V)SET_DEF (PHI_RESULT_PTR (PHI), (V)) > +#define PHI_RESULT(PHI)gimple_phi_result (PHI) > +#define SET_PHI_RESULT(PHI, V)SET_DEF (gimple_phi_result_ptr (PHI), (V)) > /* > #define PHI_ARG_DEF(PHI, I)USE_FROM_PTR (PHI_ARG_DEF_PTR ((PHI), (I))) > */ > -- > 2.34.1 >
[COMMITTED] testsuite: debug: fix errant whitespace
I added some whitespace unintentionally in r15-3723-g284c03ec79ec20, fix that. gcc/testsuite/ChangeLog: * gcc.dg/debug/btf/btf-datasec-1.c: Fix whitespace. --- Pushed as obvious. gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c index 781f80774e2f..4a46479397a6 100644 --- a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c +++ b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c @@ -1,4 +1,3 @@ - /* BTF generation of BTF_KIND_DATASEC records. We expect 3 DATASEC records: one for each of .data, .rodata and .bss. -- 2.46.0
Re: [PATCH v5] c++: deleting explicitly-defaulted functions [PR116162]
On 9/19/24 5:35 PM, Marek Polacek wrote: On Tue, Sep 17, 2024 at 12:50:46PM -0400, Jason Merrill wrote: On 9/16/24 7:14 PM, Marek Polacek wrote: +/* Mark an explicitly defaulted function FN as =deleted and warn. + IMPLICIT_FN is the corresponding special member function that + would have been implicitly declared. */ + +void +maybe_delete_defaulted_fn (tree fn, tree implicit_fn) +{ + if (DECL_ARTIFICIAL (fn) || !DECL_DEFAULTED_IN_CLASS_P (fn)) +return; + + DECL_DELETED_FN (fn) = true; + + if (!warn_defaulted_fn_deleted) +return; The flag shouldn't affect the error cases; I'd drop this check. Dropped. + auto_diagnostic_group d; + const special_function_kind kind = special_function_p (fn); + tree parmtype += TREE_VALUE (DECL_XOBJ_MEMBER_FUNCTION_P (fn) + ? TREE_CHAIN (TYPE_ARG_TYPES (TREE_TYPE (fn))) + : FUNCTION_FIRST_USER_PARMTYPE (fn)); + const bool illformed_p +/* [dcl.fct.def.default] "if F1 is an assignment operator"... */ += (SFK_ASSIGN_P (kind) + /* "and the return type of F1 differs from the return type of F2" */ + && (!same_type_p (TREE_TYPE (TREE_TYPE (fn)), +TREE_TYPE (TREE_TYPE (implicit_fn))) + /* "or F1's non-object parameter type is not a reference, + the program is ill-formed" */ + || !TYPE_REF_P (parmtype))); + /* Decide if we want to emit a pedwarn, error, or a warning. */ + diagnostic_t diag_kind; + if (cxx_dialect >= cxx20) +diag_kind = illformed_p ? DK_ERROR : DK_WARNING; + else +diag_kind = DK_PEDWARN; Error should be errors in all standard modes; it doesn't make sense to have a softer diagnostic in an older mode when it's ill-formed in all. Non-errors should be warnings or pedwarns depending on the standard mode. Aaah, I misunderstood. Hopefully I got it right this time. + /* Don't warn for template instantiations. */ + if (DECL_TEMPLATE_INSTANTIATION (fn) && diag_kind == DK_WARNING) +return; + + const char *wmsg; + switch (kind) +{ +case sfk_copy_constructor: + wmsg = G_("explicitly defaulted copy constructor is implicitly deleted " + "because its declared type does not match the type of an " + "implicit copy constructor"); + break; +case sfk_move_constructor: + wmsg = G_("explicitly defaulted move constructor is implicitly deleted " + "because its declared type does not match the type of an " + "implicit move constructor"); + break; +case sfk_copy_assignment: + wmsg = G_("explicitly defaulted copy assignment operator is implicitly " + "deleted because its declared type does not match the type " + "of an implicit copy assignment operator"); + break; +case sfk_move_assignment: + wmsg = G_("explicitly defaulted move assignment operator is implicitly " + "deleted because its declared type does not match the type " + "of an implicit move assignment operator"); + break; +default: + gcc_unreachable (); +} + if (emit_diagnostic (diag_kind, DECL_SOURCE_LOCATION (fn), + OPT_Wdefaulted_function_deleted, wmsg)) Let's not pass the OPT when DK_ERROR. Done. I've added new tests to cover -Wno-defaulted-function-deleted. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? OK. -- >8 -- This PR points out the we're not implementing [dcl.fct.def.default] properly. Consider e.g. struct C { C(const C&&) = default; }; where we wrongly emit an error, but the move ctor should be just =deleted. According to [dcl.fct.def.default], if the type of the special member function differs from the type of the corresponding special member function that would have been implicitly declared in a way other than as allowed by 2.1-4, the function is defined as deleted. There's an exception for assignment operators in which case the program is ill-formed. clang++ has a warning for when we delete an explicitly-defaulted function so this patch adds it too. When the code is ill-formed, we emit an error in all modes. Otherwise, we emit a pedwarn in C++17 and a warning in C++20. PR c++/116162 gcc/c-family/ChangeLog: * c.opt (Wdefaulted-function-deleted): New. gcc/cp/ChangeLog: * class.cc (check_bases_and_members): Don't set DECL_DELETED_FN here, leave it to defaulted_late_check. * cp-tree.h (maybe_delete_defaulted_fn): Declare. (defaulted_late_check): Add a tristate parameter. * method.cc (maybe_delete_defaulted_fn): New. (defaulted_late_check): Add a tristate parameter. Call maybe_delete_defaulted_fn instead of giving an error. gcc/ChangeLog: * doc/invoke.texi: Document -Wdefaulted-function-deleted. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/defaulted15.C: Add dg-warning/dg-error. * g++.dg/cpp0x/defaulted51.C: Likewi
[PATCH] i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738]
Hi! min/max patterns for intrinsics which on x86 result in the second input operand if the two operands are both zeros or one or both of them are a NaN shouldn't use SMIN/SMAX RTL, because that is similarly to MIN_EXPR/MAX_EXPR undefined what will be the result in those cases. The following patch adds an expander which uses either a new pattern with UNSPEC_IEEE_M{AX,IN} or use the S{MIN,MAX} representation of the same. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? P.S. I have a patch to replace UNSPEC_IEEE_M{AX,IN} with IF_THEN_ELSE (except for the 3dNOW! PFMIN/MAX, those actually are documented to behave differently), but it actually doesn't improve anything much, as simplify_const_relational_operation nor simplify_ternary_operation aren't able to fold comparisons with two CONST_VECTOR operands or IF_THEN_ELSE with 3 CONST_VECTOR operands. So, maybe better approach will be to generic fold the builtins with constant arguments (maybe leaving NaNs to runtime). 2024-09-19 Uros Bizjak Jakub Jelinek PR target/116738 * config/i386/subst.md (mask_scalar_operand_arg34, mask_scalar_expand_op3, round_saeonly_scalar_mask_arg3): New subst attributes. * config/i386/sse.md (_vm3): Change from define_insn to define_expand, rename the old define_insn to ... (*_vm3): ... this. (_ieee_vm3): New define_insn. * gcc.target/i386/sse-pr116738.c: New test. --- gcc/config/i386/subst.md.jj 2024-09-18 15:49:42.200791315 +0200 +++ gcc/config/i386/subst.md2024-09-19 12:32:51.048626421 +0200 @@ -366,6 +366,8 @@ (define_subst_attr "mask_scalar_operand4 (define_subst_attr "mask_scalarcz_operand4" "mask_scalarcz" "" "%{%5%}%N4") (define_subst_attr "mask_scalar4_dest_false_dep_for_glc_cond" "mask_scalar" "1" "operands[4] == CONST0_RTX(mode)") (define_subst_attr "mask_scalarc_dest_false_dep_for_glc_cond" "mask_scalarc" "1" "operands[3] == CONST0_RTX(V8HFmode)") +(define_subst_attr "mask_scalar_operand_arg34" "mask_scalar" "" ", operands[3], operands[4]") +(define_subst_attr "mask_scalar_expand_op3" "mask_scalar" "3" "5") (define_subst "mask_scalar" [(set (match_operand:SUBST_V 0) @@ -473,6 +475,7 @@ (define_subst_attr "round_saeonly_scalar (define_subst_attr "round_saeonly_scalar_constraint" "round_saeonly_scalar" "vm" "v") (define_subst_attr "round_saeonly_scalar_prefix" "round_saeonly_scalar" "vex" "evex") (define_subst_attr "round_saeonly_scalar_nimm_predicate" "round_saeonly_scalar" "nonimmediate_operand" "register_operand") +(define_subst_attr "round_saeonly_scalar_mask_arg3" "round_saeonly_scalar" "" ", operands[]") (define_subst "round_saeonly_scalar" [(set (match_operand:SUBST_V 0) --- gcc/config/i386/sse.md.jj 2024-09-10 16:26:02.875151133 +0200 +++ gcc/config/i386/sse.md 2024-09-19 12:43:31.693030695 +0200 @@ -,7 +,27 @@ (define_insn "*ieee_3 (const_string "*"))) (set_attr "mode" "")]) -(define_insn "_vm3" +(define_expand "_vm3" + [(set (match_operand:VFH_128 0 "register_operand") + (vec_merge:VFH_128 + (smaxmin:VFH_128 + (match_operand:VFH_128 1 "register_operand") + (match_operand:VFH_128 2 "nonimmediate_operand")) +(match_dup 1) +(const_int 1)))] + "TARGET_SSE" +{ + if (!flag_finite_math_only || flag_signed_zeros) +{ + emit_insn (gen__ieee_vm3 +(operands[0], operands[1], operands[2] + + )); + DONE; +} +}) + +(define_insn "*_vm3" [(set (match_operand:VFH_128 0 "register_operand" "=x,v") (vec_merge:VFH_128 (smaxmin:VFH_128 @@ -3348,6 +3368,25 @@ (define_insn "_vm3") + (set_attr "mode" "")]) + +(define_insn "_ieee_vm3" + [(set (match_operand:VFH_128 0 "register_operand" "=x,v") + (vec_merge:VFH_128 + (unspec:VFH_128 + [(match_operand:VFH_128 1 "register_operand" "0,v") +(match_operand:VFH_128 2 "nonimmediate_operand" "xm,")] + IEEE_MAXMIN) +(match_dup 1) +(const_int 1)))] + "TARGET_SSE" + "@ + \t{%2, %0|%0, %2} + v\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sse") + (set_attr "btver2_sse_attr" "maxmin") (set_attr "prefix" "") (set_attr "mode" "")]) --- gcc/testsuite/gcc.target/i386/sse-pr116738.c.jj 2024-09-19 12:52:33.502681950 +0200 +++ gcc/testsuite/gcc.target/i386/sse-pr116738.c2024-09-19 12:54:20.938219741 +0200 @@ -0,0 +1,28 @@ +/* PR target/116738 */ +/* { dg-do run } */ +/* { dg-options "-O2 -msse" } */ +/* { dg-require-effective-target sse } */ + +#include "sse-check.h" + +static inline float +clamp (float f) +{ + __m128 v = _mm_set_ss (f); + __m128 zero = _mm_setzero_ps (); + __m128 greatest = _mm_set_ss (__FLT_MAX__); + v = _mm_min_ss (v, greatest); + v = _mm_max_ss (v, zero); + return _mm_cvtss_f32 (v); +} + +static void +sse
Re: [PATCH] ltmain.sh: allow more flags at link-time
Sam James writes: > Sam James writes: > >> libtool defaults to filtering flags passed at link-time. >> >> This brings the filtering in GCC's 'fork' of libtool into sync with >> upstream libtool commit 22a7e547e9857fc94fe5bc7c921d9a4b49c09f8e. >> >> In particular, this now allows some harmless diagnostic flags (especially >> useful for things like -Werror=odr), more optimization flags, and some >> Clang-specific options. >> >> GCC's -flto documentation mentions: >>> To use the link-time optimizer, -flto and optimization options should be >>> specified at compile time and during the final link. It is recommended >>> that you compile all the files participating in the same link with the >>> same options and also specify those options at link time. >> >> This allows compliance with that. >> >> * ltmain.sh (func_mode_link): Allow various flags through filter. >> --- >> We have been using this for a while now downstream. >> >> H.J., please take a look. >> >> I think this also explains >> https://src.fedoraproject.org/rpms/binutils/blob/rawhide/f/binutils.spec#_947. >> >> ltmain.sh | 46 ++ >> 1 file changed, 34 insertions(+), 12 deletions(-) > > Ping. The change should be harmless given the flags should be filtered > out earlier if anything is wrong, and we've been using it internally for > quite some time (i.e. it doesn't *add* the flags, just means that _if > they arrive_ at libtool, they're not dropped at link-time). > Ping. > [...]
Re: [PATCH] toplevel: Error out if using --disable-libstdcxx with bootstrap [PR105474]
On Thu, Aug 22, 2024 at 2:45 PM Andrew Pinski wrote: > > Bootstrapping and using --disable-libstdcxx will cause a build failure deep > in compiling > stage2 so instead error out early in the toplevel configure so it is more > user friendly. > > Bootstrapped and tested on x86_64-linux-gnu. > Also made sure --disable-libstdcxx without --disable-bootstrap failed. Ping? This is just a simple patch to make it more user friendly and fail early on rather than waiting until the build fails. Thanks, Andrew > > PR bootstrap/105474 > > ChangeLog: > > * configure: Regenerate. > * configure.ac: Error out if libstdc++ is not enabled > with bootstrapping. > > Signed-off-by: Andrew Pinski > --- > configure| 9 + > configure.ac | 9 + > 2 files changed, 18 insertions(+) > > diff --git a/configure b/configure > index 51bf1d1add1..0722242389d 100755 > --- a/configure > +++ b/configure > @@ -10235,6 +10235,15 @@ case "$enable_bootstrap:$ENABLE_GOLD: $configdirs > :,$stage1_languages," in > ;; > esac > > +# Bootstrapping GCC requires libstdc++-v3 so error out if libstdc++ is > disabled with bootstrapping > +# Note C++ is always enabled for stage1 now. > +case "$enable_bootstrap:${noconfigdirs}" in > + yes:*target-libstdc++-v3*) > +as_fn_error $? "bootstrapping with --disable-libstdcxx is not supported" > "$LINENO" 5 > +;; > +esac > + > + > extrasub_build= > for module in ${build_configdirs} ; do >if test -z "${no_recursion}" \ > diff --git a/configure.ac b/configure.ac > index 20457005e29..8be11e84db8 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -3191,6 +3191,15 @@ case "$enable_bootstrap:$ENABLE_GOLD: $configdirs > :,$stage1_languages," in > ;; > esac > > +# Bootstrapping GCC requires libstdc++-v3 so error out if libstdc++ is > disabled with bootstrapping > +# Note C++ is always enabled for stage1 now. > +case "$enable_bootstrap:${noconfigdirs}" in > + yes:*target-libstdc++-v3*) > +AC_MSG_ERROR([bootstrapping with --disable-libstdcxx is not supported]) > +;; > +esac > + > + > extrasub_build= > for module in ${build_configdirs} ; do >if test -z "${no_recursion}" \ > -- > 2.43.0 >
[PATCH] Remove PHI_RESULT_PTR and change some PHI_RESULT to be gimple_phi_result [PR116643]
There was only a few uses PHI_RESULT_PTR so lets remove it and use gimple_phi_result_ptr or gimple_phi_result directly instead. Since I was modifying ssa-iterators.h for the use of PHI_RESULT_PTR, change the use of PHI_RESULT there to be gimple_phi_result instead. This also removes one extra indirection that was done for PHI_RESULT so stage2 building should be slightly faster. Bootstrapped and tested on x86_64-linux-gnu. PR middle-end/116643 gcc/ChangeLog: * ssa-iterators.h (single_phi_def): Use gimple_phi_result instead of PHI_RESULT. (op_iter_init_phidef): Use gimple_phi_result/gimple_phi_result_ptr instead of PHI_RESULT/PHI_RESULT_PTR. * tree-ssa-operands.h (PHI_RESULT_PTR): Remove. (PHI_RESULT): Use gimple_phi_result directly. (SET_PHI_RESULT): Use gimple_phi_result_ptr directly. Signed-off-by: Andrew Pinski --- gcc/ssa-iterators.h | 6 +++--- gcc/tree-ssa-operands.h | 5 ++--- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h index b7b01fd018a..e0e555cc472 100644 --- a/gcc/ssa-iterators.h +++ b/gcc/ssa-iterators.h @@ -768,7 +768,7 @@ num_ssa_operands (gimple *stmt, int flags) inline tree single_phi_def (gphi *stmt, int flags) { - tree def = PHI_RESULT (stmt); + tree def = gimple_phi_result (stmt); if ((flags & SSA_OP_DEF) && is_gimple_reg (def)) return def; if ((flags & SSA_OP_VIRTUAL_DEFS) && !is_gimple_reg (def)) @@ -811,7 +811,7 @@ op_iter_init_phiuse (ssa_op_iter *ptr, gphi *phi, int flags) inline def_operand_p op_iter_init_phidef (ssa_op_iter *ptr, gphi *phi, int flags) { - tree phi_def = PHI_RESULT (phi); + tree phi_def = gimple_phi_result (phi); int comp; clear_and_done_ssa_iter (ptr); @@ -833,7 +833,7 @@ op_iter_init_phidef (ssa_op_iter *ptr, gphi *phi, int flags) /* The first call to op_iter_next_def will terminate the iterator since all the fields are NULL. Simply return the result here as the first and therefore only result. */ - return PHI_RESULT_PTR (phi); + return gimple_phi_result_ptr (phi); } /* Return true is IMM has reached the end of the immediate use stmt list. */ diff --git a/gcc/tree-ssa-operands.h b/gcc/tree-ssa-operands.h index 8072932564a..b6534f18c66 100644 --- a/gcc/tree-ssa-operands.h +++ b/gcc/tree-ssa-operands.h @@ -72,9 +72,8 @@ struct GTY(()) ssa_operands { #define USE_OP_PTR(OP) (&((OP)->use_ptr)) #define USE_OP(OP) (USE_FROM_PTR (USE_OP_PTR (OP))) -#define PHI_RESULT_PTR(PHI)gimple_phi_result_ptr (PHI) -#define PHI_RESULT(PHI)DEF_FROM_PTR (PHI_RESULT_PTR (PHI)) -#define SET_PHI_RESULT(PHI, V) SET_DEF (PHI_RESULT_PTR (PHI), (V)) +#define PHI_RESULT(PHI)gimple_phi_result (PHI) +#define SET_PHI_RESULT(PHI, V) SET_DEF (gimple_phi_result_ptr (PHI), (V)) /* #define PHI_ARG_DEF(PHI, I)USE_FROM_PTR (PHI_ARG_DEF_PTR ((PHI), (I))) */ -- 2.34.1
Re: [patch, fortran] Implement IANY, IALL and IPARITY for unsigned
On 9/18/24 1:20 PM, Thomas Koenig wrote: OK for trunk? OK and thanks. Jerry --- snip ---
Re: [PATCH] vect: Use simple_dce_worklist in the vectorizer [PR116711]
On Tue, Sep 17, 2024 at 11:53 PM Richard Biener wrote: > > On Tue, Sep 17, 2024 at 4:36 AM Andrew Pinski > wrote: > > > > This adds simple_dce_worklist to both the SLP vectorizer and the loop based > > vectorizer. > > This is a step into removing the dce after the loop based vectorizer. That > > DCE still > > does a few things, removing some of the induction variables which has > > become unused. That is > > something which can be improved afterwards. > > > > Note this adds it to the SLP BB vectorizer too as it is used from the loop > > based one sometimes. > > In the case of the BB SLP vectorizer, the dead statements don't get removed > > until much later in > > DSE so removing them much earlier is important. > > > > Note on the new testcase, it came up during bootstrap where the SLP pass > > would cause the need to > > invalidate the scev caches but there was no testcase for this beforehand so > > adding one is a good idea. > > > > Bootstrapped and tested on x86_64-linux-gnu with no regressions. > > In the places you add to the worklist in vectorizable_* can you please > see to do that in a place > where we could actually remove the stmt (and release the def)? Please > also add a > (inline) function like vect_remove_scalar_stmt (vinfo *, X) with X > either a stmt_vec_info (preferred) > or a gimple *. I was thinking about this and the only place where I know 100% that we might be removing the statement is `vec_info::remove_stmt` which also might be just enough to remove all of the scalar cases. Let me try removing the places which call bitmap_set_bit except for that one and report back. Though induction variables might still need to be removed too; I have to dig into that. Thanks, Andrew > > Thanks, > Richard. > > > PR tree-optimization/116711 > > > > gcc/ChangeLog: > > > > * tree-ssa-dce.cc (simple_dce_from_worklist): Returns > > true if something was removed. > > * tree-ssa-dce.h (simple_dce_from_worklist): Change return > > type to bool. > > * tree-vect-loop.cc (vectorizable_induction): Add phi result > > to the dce worklist. > > * tree-vect-slp.cc: Add includes of tree-ssa-dce.h, > > tree-ssa-loop-niter.h and tree-scalar-evolution.h. > > (vect_slp_region): Add DCE_WORKLIST argument. Copy > > the dce_worklist from the bb vectorization info. > > (vect_slp_bbs): Add DCE_WORKLIST argument. Update call to > > vect_slp_region. > > (vect_slp_if_converted_bb): Add DCE_WORKLIST argument. Update > > call to vect_slp_bbs. > > (vect_slp_function): Update call to vect_slp_bbs and call > > simple_dce_from_worklist. Also free the loop iteration and > > scev cache if something was removed. > > * tree-vect-stmts.cc (vectorizable_bswap): Add the lhs of the > > scalar stmt > > to the dce work list. > > (vectorizable_call): Likewise. > > (vectorizable_simd_clone_call): Likewise. > > (vectorizable_conversion): Likewise. > > (vectorizable_assignment): Likewise. > > (vectorizable_shift): Likewise. > > (vectorizable_operation): Likewise. > > (vectorizable_condition): Likewise. > > (vectorizable_comparison_1): Likewise. > > * tree-vectorizer.cc: Include tree-ssa-dce.h. > > (vec_info::remove_stmt): Add all of the uses of the store to the > > dce work list. > > (try_vectorize_loop_1): Update call to vect_slp_if_converted_bb. > > Copy the dce worklist into the loop's vectinfo dce worklist. > > (pass_vectorize::execute): Copy loops' vectinfo dce worklist > > locally. > > Add call to simple_dce_from_worklist. > > * tree-vectorizer.h (vec_info): Add dce_worklist field. > > (vect_slp_if_converted_bb): Add bitmap argument. > > * tree-vectorizer.h (vect_slp_if_converted_bb): Add bitmap argument. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/vect/bb-slp-77.c: New test. > > > > Signed-off-by: Andrew Pinski > > --- > > gcc/testsuite/gcc.dg/vect/bb-slp-77.c | 15 + > > gcc/tree-ssa-dce.cc | 5 +++-- > > gcc/tree-ssa-dce.h| 2 +- > > gcc/tree-vect-loop.cc | 3 +++ > > gcc/tree-vect-slp.cc | 32 --- > > gcc/tree-vect-stmts.cc| 16 +- > > gcc/tree-vectorizer.cc| 21 +- > > gcc/tree-vectorizer.h | 5 - > > 8 files changed, 85 insertions(+), 14 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-77.c > > > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-77.c > > b/gcc/testsuite/gcc.dg/vect/bb-slp-77.c > > new file mode 100644 > > index 000..a74bb17e25c > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-77.c > > @@ -0,0 +1,15 @@ > > +/* { dg-do compile } */ > > + > > +/* Make sure SLP vectoriz
[PATCH] rs6000, Fix test builtins-1-p10-runnable.c
GCC maintainers: This patch removes an expected value change that was made to verify the error checking for the test was working. Apparently, it didn't get removed from the final patch. The patch fixes the single test error in the builtins-1-10-runnable.c test. The patch was run on a Power 10. Please let me know if the patch is acceptable for mainline. Thanks. Carl Love - rs6000, Fix test builtins-1-p10-runnable.c The first element of the expected result was apparently changed for testing purposes. The change didn't get removed before the commit. The issue was introduced in commit: commit f1ad419ebfdcfaf26117e069b10bd1b154276049 Author: Carl Love Date: Fri Sep 4 19:24:22 2020 -0500 rs6000, vector integer multiply/divide/modulo instructions Remove the test input. gcc/testsuite/ChangeLog: * gcc.target/powerpc/builtins-1-p10-runnable.c: Remove expected value for testing. Uncomment correct expected result. --- gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c index 222c8b3a409..5402852f82b 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c @@ -281,8 +281,7 @@ int main() /* Signed word multiply high */ i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 }; i_arg2 = (vector int){ 2, 3, 4, 5}; - // vec_i_expected = (vector int){-1, -2, -2, -3}; - vec_i_expected = (vector int){1, -2, -2, -3}; + vec_i_expected = (vector int){-1, -2, -2, -3}; vec_i_result = vec_mulh (i_arg1, i_arg2); -- 2.46.0
Re: [PATCH] rs6000, Fix test builtins-1-p10-runnable.c
GCC maintainers: Please ignore this patch. Attached the wrong patch to the message. Sorry for the noise. Carl On 9/19/24 4:40 PM, Carl Love wrote: GCC maintainers: This patch removes an expected value change that was made to verify the error checking for the test was working. Apparently, it didn't get removed from the final patch. The patch fixes the single test error in the builtins-1-10-runnable.c test. The patch was run on a Power 10. Please let me know if the patch is acceptable for mainline. Thanks. Carl Love - rs6000, Fix test builtins-1-p10-runnable.c The first element of the expected result was apparently changed for testing purposes. The change didn't get removed before the commit. The issue was introduced in commit: commit f1ad419ebfdcfaf26117e069b10bd1b154276049 Author: Carl Love Date: Fri Sep 4 19:24:22 2020 -0500 rs6000, vector integer multiply/divide/modulo instructions Remove the test input. gcc/testsuite/ChangeLog: * gcc.target/powerpc/builtins-1-p10-runnable.c: Remove expected value for testing. Uncomment correct expected result. --- gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c index 222c8b3a409..5402852f82b 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c @@ -281,8 +281,7 @@ int main() /* Signed word multiply high */ i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 }; i_arg2 = (vector int){ 2, 3, 4, 5}; - // vec_i_expected = (vector int){-1, -2, -2, -3}; - vec_i_expected = (vector int){1, -2, -2, -3}; + vec_i_expected = (vector int){-1, -2, -2, -3}; vec_i_result = vec_mulh (i_arg1, i_arg2);
[COMMITTED] testsuite: debug: fix dejagnu directive syntax
In this case, they were all harmless in reality (no diff in test logs). gcc/testsuite/ChangeLog: * gcc.dg/debug/btf/btf-array-1.c: Fix dg-do directive syntax. * gcc.dg/debug/btf/btf-bitfields-1.c: Ditto. * gcc.dg/debug/btf/btf-bitfields-2.c: Ditto. * gcc.dg/debug/btf/btf-datasec-1.c: Ditto. * gcc.dg/debug/btf/btf-union-1.c: Ditto. * gcc.dg/debug/ctf/ctf-anonymous-struct-1.c: Ditto. * gcc.dg/debug/ctf/ctf-anonymous-union-1.c: Ditto. * gcc.dg/debug/ctf/ctf-array-1.c: Ditto. * gcc.dg/debug/ctf/ctf-array-2.c: Ditto. * gcc.dg/debug/ctf/ctf-array-4.c: Ditto. * gcc.dg/debug/ctf/ctf-array-5.c: Ditto. * gcc.dg/debug/ctf/ctf-array-6.c: Ditto. * gcc.dg/debug/ctf/ctf-attr-mode-1.c: Ditto. * gcc.dg/debug/ctf/ctf-attr-used-1.c: Ditto. * gcc.dg/debug/ctf/ctf-bitfields-1.c: Ditto. * gcc.dg/debug/ctf/ctf-bitfields-2.c: Ditto. * gcc.dg/debug/ctf/ctf-bitfields-3.c: Ditto. * gcc.dg/debug/ctf/ctf-bitfields-4.c: Ditto. * gcc.dg/debug/ctf/ctf-complex-1.c: Ditto. * gcc.dg/debug/ctf/ctf-cvr-quals-1.c: Ditto. * gcc.dg/debug/ctf/ctf-cvr-quals-2.c: Ditto. * gcc.dg/debug/ctf/ctf-cvr-quals-3.c: Ditto. * gcc.dg/debug/ctf/ctf-cvr-quals-4.c: Ditto. * gcc.dg/debug/ctf/ctf-enum-1.c: Ditto. * gcc.dg/debug/ctf/ctf-enum-2.c: Ditto. * gcc.dg/debug/ctf/ctf-file-scope-1.c: Ditto. * gcc.dg/debug/ctf/ctf-float-1.c: Ditto. * gcc.dg/debug/ctf/ctf-forward-1.c: Ditto. * gcc.dg/debug/ctf/ctf-forward-2.c: Ditto. * gcc.dg/debug/ctf/ctf-func-index-1.c: Ditto. * gcc.dg/debug/ctf/ctf-function-pointers-1.c: Ditto. * gcc.dg/debug/ctf/ctf-function-pointers-2.c: Ditto. * gcc.dg/debug/ctf/ctf-function-pointers-3.c: Ditto. * gcc.dg/debug/ctf/ctf-function-pointers-4.c: Ditto. * gcc.dg/debug/ctf/ctf-functions-1.c: Ditto. * gcc.dg/debug/ctf/ctf-int-1.c: Ditto. * gcc.dg/debug/ctf/ctf-objt-index-1.c: Ditto. * gcc.dg/debug/ctf/ctf-pointers-1.c: Ditto. * gcc.dg/debug/ctf/ctf-pointers-2.c: Ditto. * gcc.dg/debug/ctf/ctf-preamble-1.c: Ditto. * gcc.dg/debug/ctf/ctf-str-table-1.c: Ditto. * gcc.dg/debug/ctf/ctf-struct-1.c: Ditto. * gcc.dg/debug/ctf/ctf-struct-2.c: Ditto. * gcc.dg/debug/ctf/ctf-struct-array-1.c: Ditto. * gcc.dg/debug/ctf/ctf-struct-array-2.c: Ditto. * gcc.dg/debug/ctf/ctf-typedef-1.c: Ditto. * gcc.dg/debug/ctf/ctf-typedef-2.c: Ditto. * gcc.dg/debug/ctf/ctf-typedef-3.c: Ditto. * gcc.dg/debug/ctf/ctf-typedef-struct-1.c: Ditto. * gcc.dg/debug/ctf/ctf-typedef-struct-2.c: Ditto. * gcc.dg/debug/ctf/ctf-typedef-struct-3.c: Ditto. * gcc.dg/debug/ctf/ctf-union-1.c: Ditto. * gcc.dg/debug/ctf/ctf-variables-1.c: Ditto. * gcc.dg/debug/ctf/ctf-variables-2.c: Ditto. * gcc.dg/debug/ctf/ctf-variables-3.c: Ditto. --- Pushed as obvious. gcc/testsuite/gcc.dg/debug/btf/btf-array-1.c | 2 +- gcc/testsuite/gcc.dg/debug/btf/btf-bitfields-1.c | 2 +- gcc/testsuite/gcc.dg/debug/btf/btf-bitfields-2.c | 2 +- gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c | 3 ++- gcc/testsuite/gcc.dg/debug/btf/btf-union-1.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-anonymous-struct-1.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-anonymous-union-1.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-array-1.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-array-2.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-array-4.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-array-5.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-array-6.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-attr-mode-1.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-attr-used-1.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-1.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-2.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-3.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-4.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-complex-1.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-1.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-2.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-3.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-4.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-enum-1.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-enum-2.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-file-scope-1.c| 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-float-1.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-forward-1.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-forward-2.c | 2 +- gcc/testsuite/gcc.dg/debug/ctf/ctf-func-index-1.c|
Re: [C PATCH] fix crash when checking for compatibility of structures [PR116726]
On Tue, 17 Sep 2024, Martin Uecker wrote: > Here is a fix for a mistake I made when recursively checking for > type compatibility. > > > Bootstrapped and regression tested on x86-64. > > > c: fix crash when checking for compatibility of structures [PR116726] > > When checking for compatibility of structure or union types in > tagged_types_tu_compatible_p, restore the old value of the pointer to > the top of the temporary cache after recursively calling > comptypes_internal > when looping over the members of a structure of union. While the next > iteration of the loop overwrites the pointer, I missed the fact that it > can > be accessed again when types of function arguments are compared as part > of recursive type checking and the function is entered again. > > PR c/116726 > > gcc/Changelog: > * c/c-typeck.cc (tagged_types_tu_compatible_p): Restore value > of the cache after recursing into comptypes_internal. > > gcc/testsuite/Changelog: > * pr116726.c: New test. OK. -- Joseph S. Myers josmy...@redhat.com
Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins
On Thu, 19 Sep 2024, mmalcom...@nvidia.com wrote: > 6) Anything special about floating point maths that I'm tripping up on? Correct atomic operations with floating-point operands should ensure that exceptions raised exactly correspond to the operands for which the operation succeeded, and not to the operands for any previous attempts where the compare-exchange failed. There is a lengthy note in the C standard (in C11 it's a footnote in 6.5.16.2, in C17 it's a Note in 6.5.16.2 and in C23 that subclause has become 6.5.17.3) that discusses appropriate code sequences to achieve this. In GCC the implementation of this is in c-typeck.cc:build_atomic_assign, which in turn calls targetm.atomic_assign_expand_fenv (note that we have the complication for C of not introducing libm dependencies in code that only uses standard language features and not , or , so direct use of functions is inappropriate here). I would expect such built-in functions to follow the same semantics for floating-point exceptions as _Atomic compound assignment does. (Note that _Atomic compound assignment is more general in the allowed operands, because compound assignment is a heterogeneous operation - for example, the special floating-point logic in build_atomic_assign includes the case where the LHS of the compound assignment is of atomic integer type but the RHS is of floating type. However, built-in functions allow memory orders other than seq_cst to be used, whereas _Atomic compound assignment is limited to the seq_cst case.) So it would seem appropriate for the implementation of such built-in functions to make use of targetm.atomic_assign_expand_fenv for floating-point environment handling, and for testcases to include tests analogous to c11-atomic-exec-5.c that exceptions are being handled correctly. Cf. N2329 which suggested such operations for C in (but tried to do to many things in one paper to be accepted into C); it didn't go into the floating-point exceptions semantics but simple correctness would indicate avoiding spurious exceptions from discarded computations. -- Joseph S. Myers josmy...@redhat.com
[PATCH] testsuite/116784 - match up SLP scan and vectorized scan
The test used vect_perm_short for the vectorized scanning but vect_perm3_short for whether that's done with SLP. We're now generally expecting SLP to be used - even as fallback, so the following adjusts both to match up, fixing the powerpc64 reported testsuite issue. Tested on x86_64-unknwon-linux-gnu, aarch64, riscv and powerpc64, pushed. PR testsuite/116784 * gcc.dg/vect/slp-perm-9.c: Use vect_perm_short also for the SLP check. --- gcc/testsuite/gcc.dg/vect/slp-perm-9.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c index c9468d81a9d..0c3feabf190 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c @@ -58,5 +58,5 @@ int main (int argc, const char* argv[]) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { vect_perm_short || vect32 } || vect_load_lanes } } } } */ /* We don't try permutes with a group size of 3 for variable-length vectors. */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_perm3_short || { vect32 || vect_load_lanes } } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_perm3_short || { vect32 || vect_load_lanes } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_perm_short || { vect32 || vect_load_lanes } } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_perm_short || { vect32 || vect_load_lanes } } } } } */ -- 2.43.0
Re: [PATCH v11] ada: fix timeval timespec on 32 bits archs with 64 bits time_t [PR114065]
Marc Poulhiès writes: > Nicolas Boulenguez writes: > >> PR ada/114065 >> >> Hello. >> Any news about these patches? > > Hello, > > Sorry about the delay. Arnaud already replied on BZ, but I'll add a few > remarks. Also, I forgot to mention that your changes don't include any changelog (required for any change in GCC -- see https://gcc.gnu.org/contribute.html for more details). You can scaffold these sections by piping the diff in the `contrib/mklog.py` script and then filling the gaps. Marc
[PATCH v4] match: Fix A || B not optimized to true when !B implies A [PR114326]
From: kelefth In expressions like (a != b || ((a ^ b) & c) == d) and (a != b || (a ^ b) == c), (a ^ b) is folded to false. In the equivalent expressions (((a ^ b) & c) == d || a != b) and ((a ^ b) == c || a != b) this is not happening. This patch adds the following simplifications in match.pd: ((a ^ b) & c) cmp d || a != b --> 0 cmp d || a != b (a ^ b) cmp c || a != b --> 0 cmp c || a != b PR tree-optimization/114326 gcc/ChangeLog: * match.pd: Add two patterns to fold a ^ b to 0, when a == b. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fold-xor-and-or.c: New test. * gcc.dg/tree-ssa/fold-xor-or.c: New test. Tested-by: Christoph Müllner Signed-off-by: Philipp Tomsich Signed-off-by: Konstantinos Eleftheriou --- gcc/match.pd | 32 ++- .../gcc.dg/tree-ssa/fold-xor-and-or.c | 55 +++ gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c | 55 +++ 3 files changed, 141 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c diff --git a/gcc/match.pd b/gcc/match.pd index 4aa610e2270..3c4f9b5f774 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3761,6 +3761,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (types_match (type, TREE_TYPE (@0))) (bit_xor @0 { build_one_cst (type); } )) +/* ((a ^ b) & c) cmp d || a != b --> (0 cmp d || a != b). */ +(for cmp (simple_comparison) + (simplify +(bit_ior + (cmp:c + (bit_and:c + (bit_xor:c @0 @1) + tree_expr_nonzero_p@2) + @3) + (ne:c@4 @0 @1)) +(bit_ior + (cmp + { build_zero_cst (TREE_TYPE (@0)); } + @3) + @4))) + +/* (a ^ b) cmp c || a != b --> (0 cmp c || a != b). */ +(for cmp (simple_comparison) + (simplify +(bit_ior + (cmp:c + (bit_xor:c @0 @1) + @2) + (ne:c@3 @0 @1)) +(bit_ior + (cmp + { build_zero_cst (TREE_TYPE (@0)); } + @2) + @3))) + /* We can't reassociate at all for saturating types. */ (if (!TYPE_SATURATING (type)) @@ -10763,4 +10793,4 @@ and, } } (if (full_perm_p) - (vec_perm (op@3 @0 @1) @3 @2)) + (vec_perm (op@3 @0 @1) @3 @2)) \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c new file mode 100644 index 000..e5dc98e7541 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c @@ -0,0 +1,55 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +typedef unsigned long int uint64_t; + +int cmp1(int d1, int d2) { + if (((d1 ^ d2) & 0xabcd) == 0 || d1 != d2) +return 0; + return 1; +} + +int cmp2(int d1, int d2) { + if (d1 != d2 || ((d1 ^ d2) & 0xabcd) == 0) +return 0; + return 1; +} + +int cmp3(int d1, int d2) { + if (10 > (0xabcd & (d2 ^ d1)) || d2 != d1) +return 0; + return 1; +} + +int cmp4(int d1, int d2) { + if (d2 != d1 || 10 > (0xabcd & (d2 ^ d1))) +return 0; + return 1; +} + +int cmp1_64(uint64_t d1, uint64_t d2) { + if (((d1 ^ d2) & 0xabcd) == 0 || d1 != d2) +return 0; + return 1; +} + +int cmp2_64(uint64_t d1, uint64_t d2) { + if (d1 != d2 || ((d1 ^ d2) & 0xabcd) == 0) +return 0; + return 1; +} + +int cmp3_64(uint64_t d1, uint64_t d2) { + if (10 > (0xabcd & (d2 ^ d1)) || d2 != d1) +return 0; + return 1; +} + +int cmp4_64(uint64_t d1, uint64_t d2) { + if (d2 != d1 || 10 > (0xabcd & (d2 ^ d1))) +return 0; + return 1; +} + +/* The if should be removed, so the condition should not exist */ +/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c new file mode 100644 index 000..c55cfbcc84c --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c @@ -0,0 +1,55 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +typedef unsigned long int uint64_t; + +int cmp1(int d1, int d2) { + if ((d1 ^ d2) == 0xabcd || d1 != d2) +return 0; + return 1; +} + +int cmp2(int d1, int d2) { + if (d1 != d2 || (d1 ^ d2) == 0xabcd) +return 0; + return 1; +} + +int cmp3(int d1, int d2) { + if (0xabcd > (d2 ^ d1) || d2 != d1) +return 0; + return 1; +} + +int cmp4(int d1, int d2) { + if (d2 != d1 || 0xabcd > (d2 ^ d1)) +return 0; + return 1; +} + +int cmp1_64(uint64_t d1, uint64_t d2) { + if ((d1 ^ d2) == 0xabcd || d1 != d2) +return 0; + return 1; +} + +int cmp2_64(uint64_t d1, uint64_t d2) { + if (d1 != d2 || (d1 ^ d2) == 0xabcd) +return 0; + return 1; +} + +int cmp3_64(uint64_t d1, uint64_t d2) { + if (0xabcd > (d2 ^ d1) || d2 != d1) +return 0; + return 1; +} + +int cmp4_64(uint64_t d1, uint64_t d2) { + if (d2 != d1 || 0xabcd > (d2 ^ d1)) +return 0; + return 1; +} + +/* The if should
[Fortran, Patch, PR101100, v1] Fix ICE when compiling with caf-lib and using proc_pointer component.
Hi all, the attached patch fixes an ICE when compiling with -fcoarray=lib and using (proc_-)pointer component in a coarray. The code was looking at the wrong location for the caf-token. Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de From 5115201ea3eb9caf673adce89c49e953cb46c375 Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Wed, 18 Sep 2024 15:55:28 +0200 Subject: [PATCH] Fortran: Allow to nullify caf token when not in ultimate component. [PR101100] gcc/fortran/ChangeLog: PR fortran/101100 * trans-expr.cc (trans_caf_token_assign): Take caf-token from decl for non ultimate coarray components. gcc/testsuite/ChangeLog: * gfortran.dg/coarray/proc_pointer_assign_1.f90: New test. --- gcc/fortran/trans-expr.cc | 8 - .../coarray/proc_pointer_assign_1.f90 | 29 +++ 2 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gfortran.dg/coarray/proc_pointer_assign_1.f90 diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index 54901c33139..18ef5e246ce 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -10371,7 +10371,13 @@ trans_caf_token_assign (gfc_se *lse, gfc_se *rse, gfc_expr *expr1, else if (lhs_attr.codimension) { lhs_tok = gfc_get_ultimate_alloc_ptr_comps_caf_token (lse, expr1); - lhs_tok = build_fold_indirect_ref (lhs_tok); + if (!lhs_tok) + { + lhs_tok = gfc_get_tree_for_caf_expr (expr1); + lhs_tok = GFC_TYPE_ARRAY_CAF_TOKEN (TREE_TYPE (lhs_tok)); + } + else + lhs_tok = build_fold_indirect_ref (lhs_tok); tmp = build2_loc (input_location, MODIFY_EXPR, void_type_node, lhs_tok, null_pointer_node); gfc_prepend_expr_to_block (&lse->post, tmp); diff --git a/gcc/testsuite/gfortran.dg/coarray/proc_pointer_assign_1.f90 b/gcc/testsuite/gfortran.dg/coarray/proc_pointer_assign_1.f90 new file mode 100644 index 000..81f0c3b19cf --- /dev/null +++ b/gcc/testsuite/gfortran.dg/coarray/proc_pointer_assign_1.f90 @@ -0,0 +1,29 @@ +!{ dg-do run } + +! Check that PR101100 is fixed. + +! Contributed by G. Steinmetz + +program p + type t +procedure(), pointer, nopass :: f + end type + + integer :: i = 0 + type(t) :: x[*] + + x%f => null() + if ( associated(x%f) ) stop 1 + + x%f => g + if (.not. associated(x%f) ) stop 2 + + call x%f() + if ( i /= 1 ) stop 3 + +contains + subroutine g() +i = 1 + end subroutine +end + -- 2.46.0
[PATCH 1/8] [RFC] Define new floating point builtin fetch_add functions
From: Matthew Malcomson This commit just defines the new names -- as yet don't implement them. Saving this commit because this is one decision, and recording what the decision was and why: Adding new floating point builtins for each floating point type that is defined in the general code *except* f128x (which would have a size greater than 16bytes -- the largest integral atomic operation we currently support). We have to base our naming on floating point *types* rather than sizes since different types can have the same size and the operations need to be distinguished based on type. N.b. one could make size-suffixed builtins that are still overloaded based on types but I thought that this was the cleaner approach. (Actual requirement is distinction based on mode, this is how I choose which internal function to use in a later patch. I believe that defining the function in terms of types and internally mapping to modes is a sensible split between user interface and internal implementation). N.b. in order to choose whether these operations are available or not in something like libstdc++ I use something like `__has_builtin(__atomic_fetch_add_fp)`. This happens to be the builtin for implementing the relevant operation on doubles, but it also seems like a nice name to check. - This would require that all compiler implementations have floating point atomics for all floating point types they support available at the same time. I don't expect this is much of a problem but invite dissent. N.b. I used the below type suffixes (following what seems like the existing convention for builtins): - float -> f - double -> - long double -> l - _FloatN -> fN (for N <- (16, 32, 64, 128)) - _FloatNx -> fNx (for N <- (32, 64)) Richi suggested doing this expansion generally for all these builtins following Cxy _Atomic semantics on IRC. Since C hasn't specified any fetch_ semantics for floating point types, C++ has only specified `atomic<>::fetch_{add,sub}`, and the operations other than these are all bitwise operations (which don't to map well to floating point), I believe I have followed that suggestion by implementing all fetch_{sub,add}/{add,sub}_fetch operations. I have not implemented anything for the __sync_* builtins on the belief that these are legacy and new code should use the __atomic_* builtins. Happy to adjust if that is a bad choice. Only the new function types were needed for most cases. The Fortran frontend does not use `builtin-types.def` so it needed the fortran `types.def` to be updated to include the floating point built in types in `enum builtin_type` local to `gfc_init_builtin_functions`. - N.b. these types are already available in the fortran frontend (being defined by `build_common_tree_nodes`), it's just that they were not available for sync-builtins.def functions until this commit. -- N.b. for this RFC I've not checked that any other frontends can access these builtins. Even the fortran frontend I've only adjusted things to ensure stuff builds. Signed-off-by: Matthew Malcomson --- gcc/builtin-types.def | 20 ++ gcc/fortran/types.def | 48 +++ gcc/sync-builtins.def | 40 3 files changed, 108 insertions(+) diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index c97d6bad1de..97ccd945b55 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -802,6 +802,26 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I2_INT, BT_VOID, BT_VOLATILE_PTR, BT_I2, BT DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I4_INT, BT_VOID, BT_VOLATILE_PTR, BT_I4, BT_INT) DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I8_INT, BT_VOID, BT_VOLATILE_PTR, BT_I8, BT_INT) DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I16_INT, BT_VOID, BT_VOLATILE_PTR, BT_I16, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT_VPTR_FLOAT_INT, BT_FLOAT, BT_VOLATILE_PTR, +BT_FLOAT, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_VPTR_DOUBLE_INT, BT_DOUBLE, BT_VOLATILE_PTR, +BT_DOUBLE, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_VPTR_LONGDOUBLE_INT, BT_LONGDOUBLE, +BT_VOLATILE_PTR, BT_LONGDOUBLE, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_BFLOAT16_VPTR_BFLOAT16_INT, BT_BFLOAT16, BT_VOLATILE_PTR, +BT_BFLOAT16, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT16_VPTR_FLOAT16_INT, BT_FLOAT16, BT_VOLATILE_PTR, +BT_FLOAT16, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32_VPTR_FLOAT32_INT, BT_FLOAT32, BT_VOLATILE_PTR, +BT_FLOAT32, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT64_VPTR_FLOAT64_INT, BT_FLOAT64, BT_VOLATILE_PTR, +BT_FLOAT64, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT128_VPTR_FLOAT128_INT, BT_FLOAT128, BT_VOLATILE_PTR, +BT_FLOAT128, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32X_VPTR_FLOAT32X_INT, BT_FLOAT32X, BT_VOLATILE_PTR, +BT_FLOAT32X, BT_INT) +DEF_FUNCTION
[PATCH 0/8] [RFC] Introduce floating point fetch_add builtins
From: Matthew Malcomson Hello, this is an RFC for adding an atomic floating point fetch_add builtin (and variants) to GCC. The atomic fetch_add operation is defined to work on the base floating point types in the C++20 standard chapter 31.7.3, and extended to work for all cv-unqualified floating point types in C++23 chapter 33.5.7.4. Honestly not sure who to Cc, please do point me to someone else if that's better. This is nowhere near complete (for one thing even the tests I've added don't fully pass), but I think I have a complete enough idea that it's worth checking if this is something that could be agreed on. As it stands no target except the nvptx backend would natively support these operations. Main questions that I'm looking to resolve with this RFC: 1) Would GCC be OK accepting this implementation even though no backend would be implementing these yet? - AIUI only the nvptx backend could theoretically implement this. - Even without a backend implementing it natively, the ability to use this in code (especially libstdc++) enables other compilers to generate better code for GPU's using standard C++. 2) Would libstdc++ be OK relying on `__has_builtin(__atomic_fetch_add_fp)` (i.e. a check on the resolved builtin rather than the more user-facing one) in order to determine whether floating point atomic fetch_add is available. - N.b. this builtin is actually the builtin working on the "double" type, one would have to rely on any compilers implementing that particular resolved builtin to also implement the other floating point atomic fetch_add builtins that they would want to support in libstdc++ `atomic<[floating_point_type]>::fetch_add`. More specific questions about the choice of which builtins to implement and whether the types are OK: 1) Is it OK to not implement the `__sync_*` versions? Since these are deprecated and the `__atomic_*` versions are there to match the C/C++ code atomic operations (which is a large part of the reason for the new floating point operations). 2) Is it OK to not implement all the `fetch_*` operations? None of the bitwise operations are specified for C++ and bitwise operations are (AIUI) rarely used on floating point values. 3) Wanting to be able to farm out to libatomic meant that we need constant names for the specialised functions. - This led to the naming convention based on floating point type. - That naming convention *could* be updated to include the special backend floating point types if needed. I have not done this mostly because I thought it would not add much, though I have not looked into this very closely. 4) Wanting to name the functions based on floating point type rather than size meant that the mapping from type passed to the overloaded version to specialised builtin was less direct than for the integral versions. - Ended up with a hard-coded table in the source to check this. - Would very much like some better approach, not certain what better approach I could find. - Will eventually at least share the hard-coded tables (which isn't happening yet because this is at RFC level). 5) Are there any other types that I should use? Similarly are there any types that I'm trying to use that I shouldn't? I *believe* I've implemented all the types that make sense and are general builtin types. Could easily have missed some (e.g. left `_Float128x` alone because AIUI it's larger than 128bits which means we don't have any other atomic operations on such data), could also easily be misunderstanding the mention in the C++ standards of "extended" types (which I believe is the `_Float*` and `bfloat16` types). 6) Anything special about floating point maths that I'm tripping up on? (Especially around CAS loop where we expand the operation directly, sometimes convert a PLUS into a MINUS of a negative value ...). Don't know of anything myself, but also a bit wary of floating point maths. N.b. I know that there's a reasonable amount of work left in: 1) Ensuring that all the places which use atomic types are updated (e.g. asan), 2) Ensuring that all frontends can use these to the level that they could use the integral atomics. 3) Ensuring the major backends can still compile libatomic (I had to do something in AArch64 libatomic backend to make it compile, seems like there might be more to do for others). Matthew Malcomson (8): Define new floating point builtin fetch_add functions Add FP types for atomic builtin overload resolution Tie the new atomic builtins to the backend Have libatomic working as first draft Use new builtins in libstdc++ First attempt at testsuite Mention floating point atomic fetch_add etc in docs Add demo implementation of one of the operations gcc/builtin-types.def | 20 + gcc/builtins.cc | 153 ++- gcc/c-family/c-c
[PATCH 7/8] [RFC] Mention floating point atomic fetch_add etc in docs
From: Matthew Malcomson Signed-off-by: Matthew Malcomson --- gcc/doc/extend.texi | 12 1 file changed, 12 insertions(+) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 66c99ef7a66..a3e3e7da5d6 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -13501,6 +13501,18 @@ the same format with the addition of a @samp{size_t} parameter inserted as the first parameter indicating the size of the object being pointed to. All objects must be the same size. +Moreover, the @samp{__atomic_fetch_add}, @samp{__atomic_fetch_sub}, +@samp{__atomic_add_fetch} and @samp{__atomic_sub_fetch} builtins can all +accept floating point types of @code{float}, @code{double}, @code{long double}, +@code{bfloat16}, @code{_Float16}, @code{_Float32}, @code{_Float64}, +@code{_Float128}, @code{_Float32x} and @code{_Float64x}. These use a lock-free +built-in function if the size of the floating point type makes that possible +and otherwise leave an external call to be resolved at run time. This external +call is of the same format but specialised to the given floating point type. +The specialised versions of these functions are denoted by one of the +suffixes @code{_fpf}, @code{_fp}, @code{_fpl}, @code{_fpf16b}, @code{_fpf16}, +@code{_fpf32}, @code{_fpf64}, @code{_fpf128}, @code{_fpf32x}, @code{_fpf64x}. + There are 6 different memory orders that can be specified. These map to the C++11 memory orders with the same names, see the C++11 standard or the @uref{https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki -- 2.43.0
[PATCH 3/8] [RFC] Tie the new atomic builtins to the backend
From: Matthew Malcomson Need to implement something in the Things implemented in this patch: 1) Update the optabs definitions to include floating point versions of atomic fetch_add variants. 2) When expanding into a CAS loop in RTL because the floating point optab is not implemented, there are now two different modes. One is the integral mode in which the atomic CAS (and load) should be performed, and one is the floating point mode in which the operation should be performed. - Extra handling of modes etc in `expand_atomic_fetch_op`. Things to highlight to any reviewer: 1) Needed another mapping from builtin to mode. This is *almost* shared code between this and the frontend. Looks like this could be shared if I put some effort into it. 2) I do not always expand into the modify before version, but also use the modify after version when unable to inline. - From looking at the dates on different parts of the code, it seems that this used to be needed before libatomic was added as a target library. Since libatomic currently implements both the fetch_ and _fetch versions I don't believe it's needed any more. 3) I `extract_bit_field` to convert between representations when expanding as a fallback (because fallback CAS loop loads in integral register and we want to reinterpret that as a floating point type before the intermediate operation). - Not sure if there's a better way I don't know about. Other than that everything seems mostly straight-forwardly following what is already done. Signed-off-by: Matthew Malcomson --- gcc/builtins.cc | 153 +--- gcc/optabs.cc | 101 gcc/optabs.def | 6 +- gcc/optabs.h| 2 +- 4 files changed, 241 insertions(+), 21 deletions(-) diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 0b902896ddd..0ffd7d0b211 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -6394,6 +6394,46 @@ get_builtin_sync_mode (int fcode_diff) return int_mode_for_size (BITS_PER_UNIT << fcode_diff, 0).require (); } +/* Reconsitute the machine modes relevant for this builtin operation from the + builtin difference from the _N version of a fetch_add atomic. + + Only works for floating point atomic builtins. + FCODE_DIFF should be fcode - base, where base is the FOO_N code for the + group of builtins. N.b. this is a different base to that used by + `get_builtin_sync_mode` because that matches the builtin enum offset used in + c-common.cc to find the builtin enum from a given MODE. + + TODO Really do need to figure out a bit neater code here. Should not be + inlining the mapping from type to offset in two different places. */ +static inline machine_mode +get_builtin_fp_sync_mode (int fcode_diff, machine_mode *mode) +{ + struct type_to_offset { tree type; size_t offset; }; + static const struct type_to_offset fp_type_mappings[] = { +{ float_type_node, 6 }, +{ double_type_node, 7 }, +{ long_double_type_node, 8 }, +{ bfloat16_type_node ? bfloat16_type_node : error_mark_node, 9 }, +{ float16_type_node ? float16_type_node : error_mark_node, 10 }, +{ float32_type_node ? float32_type_node : error_mark_node, 11 }, +{ float64_type_node ? float64_type_node : error_mark_node, 12 }, +{ float128_type_node ? float128_type_node : error_mark_node, 13 }, +{ float32x_type_node ? float32x_type_node : error_mark_node, 14 }, +{ float64x_type_node ? float64x_type_node : error_mark_node, 15 } + }; + gcc_assert (fcode_diff <= 15 && fcode_diff >= 6); + for (size_t i = 0; i < sizeof(fp_type_mappings)/sizeof(fp_type_mappings[0]); i++) + { + if ((size_t)fcode_diff == fp_type_mappings[i].offset) + { +*mode = TYPE_MODE (fp_type_mappings[i].type); +return int_mode_for_size (GET_MODE_SIZE (*mode) * BITS_PER_UNIT, 0) + .require (); + } + } + gcc_unreachable (); +} + /* Expand the memory expression LOC and return the appropriate memory operand for the builtin_sync operations. */ @@ -6886,9 +6926,10 @@ expand_builtin_atomic_store (machine_mode mode, tree exp) resolved to an instruction sequence. */ static rtx -expand_builtin_atomic_fetch_op (machine_mode mode, tree exp, rtx target, +expand_builtin_atomic_fetch_op (machine_mode expand_mode, tree exp, rtx target, enum rtx_code code, bool fetch_after, - bool ignore, enum built_in_function ext_call) + bool ignore, enum built_in_function ext_call, + machine_mode load_mode = VOIDmode) { rtx val, mem, ret; enum memmodel model; @@ -6898,13 +6939,13 @@ expand_builtin_atomic_fetch_op (machine_mode mode, tree exp, rtx target, model = get_memmodel (CALL_EXPR_ARG (exp, 2)); /* Expand the operands. */ - mem = get_builtin_sync_mem (CALL_EXPR_ARG (exp, 0), mode); - val = expand_expr_force_mod
[PATCH 8/8] [RFC] Add demo implementation of one of the operations
From: Matthew Malcomson Do demo implementation in AArch64 since that's the backend I'm most familiar with. Nothing much else to say -- nice to see that the demo implementation seems to work as expected (being used for fetch_add, add_fetch and sub_fetch even though it's only defined for fetch_sub). Demo implementation ensures that I can run some execution tests. Demo is added behind a flag in order to be able to run the testsuite with different variants (with the flag and without). Ensuring that the functionality worked for both the fallback and when this optab was implemented (also check with the two different fallbacks of either using libatomic or inlining a CAS loop). In order to run with both this and the fallback implementation I use the following flag in RUNTESTFLAGS: --target_board='unix {unix/-mtesting-fp-atomics}' Signed-off-by: Matthew Malcomson --- gcc/config/aarch64/aarch64.h | 2 ++ gcc/config/aarch64/aarch64.opt | 5 + gcc/config/aarch64/atomics.md | 15 +++ 3 files changed, 22 insertions(+) diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index fac1882bcb3..c2f37545cd7 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -119,6 +119,8 @@ of LSE instructions. */ #define TARGET_OUTLINE_ATOMICS (aarch64_flag_outline_atomics) +#define TARGET_TESTING_FP_ATOMICS (aarch64_flag_testing_fp_atomics) + /* Align definitions of arrays, unions and structures so that initializations and copies can be made more efficient. This is not ABI-changing, so it only affects places where we can see the diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 6356c419399..ed031258575 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -332,6 +332,11 @@ moutline-atomics Target Var(aarch64_flag_outline_atomics) Init(2) Save Generate local calls to out-of-line atomic operations. +mtesting-fp-atomics +Target Var(aarch64_flag_testing_fp_atomics) Init(0) Save +Use the demonstration implementation of atomic_fetch_sub_ for floating +point modes. + -param=aarch64-vect-compare-costs= Target Joined UInteger Var(aarch64_vect_compare_costs) Init(1) IntegerRange(0, 1) Param When vectorizing, consider using multiple different approaches and use diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md index 32a0a723732..ee8fbcd6c58 100644 --- a/gcc/config/aarch64/atomics.md +++ b/gcc/config/aarch64/atomics.md @@ -368,6 +368,21 @@ ;; However we also implement the acquire memory barrier with DMB LD, ;; and so the ST is not blocked by the barrier. +(define_insn "atomic_fetch_sub" + [(set (match_operand:GPF 0 "register_operand" "=&w") +(match_operand:GPF 1 "aarch64_sync_memory_operand" "+Q")) +(set (match_dup 1) +(unspec_volatile:GPF +[(minus:GPF (match_dup 1) + (match_operand:GPF 2 "register_operand" "w")) + (match_operand:SI 3 "const_int_operand")] + UNSPECV_ATOMIC_LDOP_PLUS)) +(clobber (match_scratch:GPF 4 "=w"))] +"TARGET_TESTING_FP_ATOMICS" +"// Here's your sandwich.\;ldr %0, %1\;fsub %4, %0, %2\;str %4, %1\;// END" +) + + (define_insn "aarch64_atomic__lse" [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q") (unspec_volatile:ALLI -- 2.43.0
[PATCH 2/8] [RFC] Add FP types for atomic builtin overload resolution
From: Matthew Malcomson Have a bit of an ugly mapping from floating point type to the builtin using that type. Would like to find some code-sharing between this, the function (in a later patch in this series) that finds the relevant mode from a given builtin, and the general sync-builtins.def file. As yet don't have a nice way to do that, but haven't looked that hard. Other than that, seems we can cleanly emit the functions that we need. N.b. we match which function to use based on the MODE of the type for two reasons: 1) Can't match directly on type as otherwise `typedef float x` would mean that `x` could no longer be used with that intrinsic. 2) MODE (i.e. the types ABI) is the thing that we need to distinguish between when deciding which fundamental operation needs to be applied. Signed-off-by: Matthew Malcomson --- gcc/c-family/c-common.cc | 88 1 file changed, 70 insertions(+), 18 deletions(-) diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc index e7e371fd26f..c0a2b136d67 100644 --- a/gcc/c-family/c-common.cc +++ b/gcc/c-family/c-common.cc @@ -7360,13 +7360,15 @@ speculation_safe_value_resolve_return (tree first_param, tree result) static int sync_resolve_size (tree function, vec *params, bool fetch, - bool orig_format) + bool orig_format, + int *fp_specialisation_offset) { /* Type of the argument. */ tree argtype; /* Type the argument points to. */ tree type; int size; + bool valid_float = false; if (vec_safe_is_empty (params)) { @@ -7385,7 +7387,8 @@ sync_resolve_size (tree function, vec *params, bool fetch, goto incompatible; type = TREE_TYPE (type); - if (!INTEGRAL_TYPE_P (type) && !POINTER_TYPE_P (type)) + valid_float = fp_specialisation_offset && fetch && SCALAR_FLOAT_TYPE_P (type); + if (!INTEGRAL_TYPE_P (type) && !POINTER_TYPE_P (type) && !valid_float) goto incompatible; if (!COMPLETE_TYPE_P (type)) @@ -7402,6 +7405,40 @@ sync_resolve_size (tree function, vec *params, bool fetch, && !targetm.scalar_mode_supported_p (TImode)) return -1; + if (valid_float) +{ + tree fp_type = type; + /* TODO Want a better reverse-mapping between an argument type and + the builtin enum. */ + struct type_to_offset { tree type; size_t offset; }; + static const struct type_to_offset fp_type_mappings[] = { +{ float_type_node, 6 }, +{ double_type_node, 7 }, +{ long_double_type_node, 8 }, +{ bfloat16_type_node ? bfloat16_type_node : error_mark_node, 9 }, +{ float16_type_node ? float16_type_node : error_mark_node, 10 }, +{ float32_type_node ? float32_type_node : error_mark_node, 11 }, +{ float64_type_node ? float64_type_node : error_mark_node, 12 }, +{ float128_type_node ? float128_type_node : error_mark_node, 13 }, +{ float32x_type_node ? float32x_type_node : error_mark_node, 14 }, +{ float64x_type_node ? float64x_type_node : error_mark_node, 15 } + }; + size_t offset = 0; + for (size_t i = 0; + i < sizeof(fp_type_mappings)/sizeof(fp_type_mappings[0]); + ++i) { +if (TYPE_MODE (fp_type) == TYPE_MODE (fp_type_mappings[i].type)) + { +offset = fp_type_mappings[i].offset; +break; + } + } + if (offset == 0) +goto incompatible; + *fp_specialisation_offset = offset; + return -1; +} + if (size == 1 || size == 2 || size == 4 || size == 8 || size == 16) return size; @@ -7462,9 +7499,10 @@ sync_resolve_params (location_t loc, tree orig_function, tree function, arguments (e.g. EXPECTED argument of __atomic_compare_exchange_n), bool arguments (e.g. WEAK argument) or signed int arguments (memmodel kinds). */ - if (TREE_CODE (arg_type) == INTEGER_TYPE && TYPE_UNSIGNED (arg_type)) + if ((TREE_CODE (arg_type) == INTEGER_TYPE && TYPE_UNSIGNED (arg_type)) + || SCALAR_FLOAT_TYPE_P (arg_type)) { - /* Ideally for the first conversion we'd use convert_for_assignment + /* Ideally) for the first conversion we'd use convert_for_assignment so that we get warnings for anything that doesn't match the pointer type. This isn't portable across the C and C++ front ends atm. */ val = (*params)[parmnum]; @@ -8256,7 +8294,6 @@ atomic_bitint_fetch_using_cas_loop (location_t loc, NULL_TREE); } - /* Some builtin functions are placeholders for other expressions. This function should be called immediately after parsing the call expression before surrounding code has committed to the type of the expression. @@ -8277,6 +8314,9 @@ resolve_overloaded_builtin (location_t loc, tree function, and so must be rejected. */ bool fetch_op = true; bool orig_format = true; + /* Is this fun
[PATCH 5/8] [RFC] Use new builtins in libstdc++
From: Matthew Malcomson Points to question here are: 1) Whether checking for this particular internal builtin is OK (this one happens to be the one implementing the operation for a `double`, we would have to rely on the approach that if anyone implements this operation for a `double` they implement it for all the floating point types that their C++ frontend and libstdc++ handle). 2) Whether the `#if` bit should be somewhere else instead of put in the `__fetch_add_flt` function. I put it there because that's where it seemed natural, but am not familiar enough with libstdc++ to be confident in that decision. We still need the CAS loop fallback for any compiler that doesn't implement this builtin, and hence will still need some extra choice to be made for floating point types. Once all compilers we care about implement this we can remove this special handling and merge the floating point and integral operations into the same template. Signed-off-by: Matthew Malcomson --- libstdc++-v3/include/bits/atomic_base.h | 16 1 file changed, 16 insertions(+) diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h index 1c2367b39b6..d3b1a022db2 100644 --- a/libstdc++-v3/include/bits/atomic_base.h +++ b/libstdc++-v3/include/bits/atomic_base.h @@ -1217,30 +1217,41 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _Tp __fetch_add_flt(_Tp* __ptr, _Val<_Tp> __i, memory_order __m) noexcept { +#if __has_builtin(__atomic_fetch_add_fp) + return __atomic_fetch_add(__ptr, __i, int(__m)); +#else _Val<_Tp> __oldval = load(__ptr, memory_order_relaxed); _Val<_Tp> __newval = __oldval + __i; while (!compare_exchange_weak(__ptr, __oldval, __newval, __m, memory_order_relaxed)) __newval = __oldval + __i; return __oldval; +#endif } template _Tp __fetch_sub_flt(_Tp* __ptr, _Val<_Tp> __i, memory_order __m) noexcept { +#if __has_builtin(__atomic_fetch_sub) + return __atomic_fetch_sub(__ptr, __i, int(__m)); +#else _Val<_Tp> __oldval = load(__ptr, memory_order_relaxed); _Val<_Tp> __newval = __oldval - __i; while (!compare_exchange_weak(__ptr, __oldval, __newval, __m, memory_order_relaxed)) __newval = __oldval - __i; return __oldval; +#endif } template _Tp __add_fetch_flt(_Tp* __ptr, _Val<_Tp> __i) noexcept { +#if __has_builtin(__atomic_add_fetch) + return __atomic_add_fetch(__ptr, __i, __ATOMIC_SEQ_CST); +#else _Val<_Tp> __oldval = load(__ptr, memory_order_relaxed); _Val<_Tp> __newval = __oldval + __i; while (!compare_exchange_weak(__ptr, __oldval, __newval, @@ -1248,12 +1259,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION memory_order_relaxed)) __newval = __oldval + __i; return __newval; +#endif } template _Tp __sub_fetch_flt(_Tp* __ptr, _Val<_Tp> __i) noexcept { +#if __has_builtin(__atomic_sub_fetch) + return __atomic_sub_fetch(__ptr, __i, __ATOMIC_SEQ_CST); +#else _Val<_Tp> __oldval = load(__ptr, memory_order_relaxed); _Val<_Tp> __newval = __oldval - __i; while (!compare_exchange_weak(__ptr, __oldval, __newval, @@ -1261,6 +1276,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION memory_order_relaxed)) __newval = __oldval - __i; return __newval; +#endif } } // namespace __atomic_impl -- 2.43.0
[PATCH 4/8] [RFC] Have libatomic working as first draft
From: Matthew Malcomson As it stands there are still a few things to look at whether they could be improved: 1) Need to find the exact version of automake to use. I'm using automake 1.15.1 from https://ftp.gnu.org/gnu/automake/ but the header is claiming I'm using automake 1.15. 2) The internal naming is all a little "not right" up for floating point. E.g. the SIZE() macro is no longer adding a SIZE integer suffix to something but instead adding a suffix representing a type. Not sure whether the churn to fix this is worth it -- will ask upstream. 3) Have not implemented the word-size compare and swap loop fallback. This because the implementation uses a mask and the mask is not always the same for any given architecture. Hence the existing approach in code would not work for all floating point types. - I would appreciate some feedback about whether this is OK to not implement. Seems reasonable to me. 4) In the existing test for the availability of an atomic fetch operation there are two things that I do not know why they are needed and hence didn't add them to the check for atomic floating point fetch_{add,sub}. I just wanted to highlight this in case I missed something. 1) I only put the `x` register into a register with an `asm` call. To be honest I don't know why anything need be put into a register, but I didn't put the floating point value into a register because I didn't know of a standard GCC floating point register constraint that worked across all architectures. - Is there any need for this `asm` line (I copied from existing libatomic configure code without understanding). - Is there any need for the constant addition to be applied? 2) I used a cast of a 1.0 floating point literal as the "addition" for all floating point types in the configury check. - Is there something subtle I'm missing about this? (I *think* it's fine, but felt like this seemed to be a place where I could trip up without knowing). Description of things done in this commit: We implement the new floating point builtins around fetch_add. This is mostly a configure/makefile change. The main overview of the changes is that we create a new list of suffixes (_fpf, _fp, _fpl, _fp16b, _fp16, _fp32, _fp64, _fp128, _fp32x, _fp64x) and re-compile fadd_n.c and fsub_n.c for these suffixes. The existing machinery for checking whether a given atomic builtin is implemented is extended to check for these same suffixes on the atomic builtins. The existing machinery for generating atomic fetch_ implementations using a given suffix and general patterns is also re-used (n.b. with the exception that the implementation based on a compare and exchange of a word is not implemented because the pre-processor does not know the size of the floating point types). The AArch64 backend is updated slightly. It didn't build because it assumed there was some IFUNC for all operations implemented (and didn't have any IFUNC for the new floating point operations). The new functions are advertised as LIBATOMIC_1.3 in the linker map for the dynamic library. Signed-off-by: Matthew Malcomson --- libatomic/Makefile.am|6 +- libatomic/Makefile.in| 12 +- libatomic/acinclude.m4 | 49 + libatomic/auto-config.h.in | 84 +- libatomic/config/linux/aarch64/host-config.h |2 + libatomic/configure | 1153 +- libatomic/configure.ac |4 + libatomic/fadd_n.c | 23 + libatomic/fop_n.c|5 +- libatomic/fsub_n.c | 23 + libatomic/libatomic.map | 44 + libatomic/libatomic_i.h | 58 + libatomic/testsuite/Makefile.in |1 + 13 files changed, 1392 insertions(+), 72 deletions(-) diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am index efadd9dcd48..ec24f1da86b 100644 --- a/libatomic/Makefile.am +++ b/libatomic/Makefile.am @@ -110,6 +110,7 @@ IFUNC_OPT = $(word $(PAT_S),$(IFUNC_OPTIONS)) M_SIZE = -DN=$(PAT_N) M_IFUNC= $(if $(PAT_S),$(IFUNC_DEF) $(IFUNC_OPT)) M_FILE = $(PAT_BASE)_n.c +M_FLOATING = $(if $(findstring $(PAT_N),$(FPSUFFIXES)),-DFLOATING) # The lack of explicit dependency on the source file means that VPATH cannot # work properly. Instead, perform this operation by hand. First, collect a @@ -120,10 +121,13 @@ all_c_files := $(foreach dir,$(search_path),$(wildcard $(dir)/*.c)) M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files))) %_.lo: Makefile - $(LTCOMPILE) $(M_DEPS) $(M_SIZE) $(M_IFUNC) -c -o $@ $(M_SRC) + $(LTCOMPILE) $(M_DEPS) $(M_SIZE) $(M_FLOATING) $(M_IFUNC) -c -o $@ $(M_SRC) ## Include all of the sizes in the "normal" set of c
Re: [PATCH v2] aarch64: Add fp8 scalar types
Hi Claudio, > On 19 Sep 2024, at 15:09, Claudio Bantaloukas > wrote: > > External email: Use caution opening links or attachments > > > The ACLE defines a new scalar type, __mfp8. This is an opaque 8bit types that > can only be used by fp8 intrinsics. Additionally, the mfloat8_t type is made > available in arm_neon.h and arm_sve.h as an alias of the same. > > This implementation uses an unsigned INTEGER_TYPE, with precision 8 to > represent __mfp8. Conversions to int and other types are disabled via the > TARGET_INVALID_CONVERSION hook. > Additionally, operations that are typically available to integer types are > disabled via TARGET_INVALID_UNARY_OP and TARGET_INVALID_BINARY_OP hooks. > > gcc/ChangeLog: > >* config/aarch64/aarch64-builtins.cc (aarch64_mfp8_type_node): Add node >for __mfp8 type. >(aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type. >(aarch64_init_fp8_types): New function to initialise fp8 types and >register with language backends. >* config/aarch64/aarch64.cc (aarch64_mangle_type): Add ABI mangling for >new type. >(aarch64_invalid_conversion): Add function implementing >TARGET_INVALID_CONVERSION hook that blocks conversion to and from the >__mfp8 type. >(aarch64_invalid_unary_op): Add function implementing TARGET_UNARY_OP >hook that blocks operations on __mfp8 other than &. >(aarch64_invalid_binary_op): Extend TARGET_BINARY_OP hook to disallow >operations on __mfp8 type. >(TARGET_INVALID_CONVERSION): Add define. >(TARGET_INVALID_UNARY_OP): Likewise. >* config/aarch64/aarch64.h (aarch64_mfp8_type_node): Add node for > __mfp8 >type. >(aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type. >* config/aarch64/arm_neon.h (mfloat8_t): Add typedef. >* config/aarch64/arm_sve.h (mfloat8_t): Likewise. Looks like this typedef is a good candidate to go into arm_private_fp8.h so that arm_neon.h, arm_sve.h and arm_sme.h inherit it. Thanks, Kyrill > > gcc/testsuite/ChangeLog: > >* g++.target/aarch64/fp8_mangling.C: New tests exercising mangling. >* g++.target/aarch64/fp8_scalar_typecheck_2.C: New tests in C++. >* gcc.target/aarch64/fp8_scalar_1.c: New tests in C. >* gcc.target/aarch64/fp8_scalar_typecheck_1.c: Likewise. > --- > Hi, > Is this ok for master? I do not have commit rights yet, if ok, can someone > commit it on my behalf? > > Regression tested with aarch64-unknown-linux-gnu. > > Compared to V1 of the patch, in version 2: > - mangling for the __mfp8 type was added along with tests > - unneeded comments were removed > - simplified type checks in hooks > - simplified initialization of aarch64_mfp8_type_node > - separated mfloat8_t define from other fp types in arm_sve.h > - C++ tests were moved to g++.target/aarch64 > - added more tests around binary operations, function declaration, > type traits > - added tests exercising loads and stores from floating point registers > > > Thanks, > Claudio Bantaloukas > > gcc/config/aarch64/aarch64-builtins.cc| 20 + > gcc/config/aarch64/aarch64.cc | 54 ++- > gcc/config/aarch64/aarch64.h | 5 + > gcc/config/aarch64/arm_neon.h | 2 + > gcc/config/aarch64/arm_sve.h | 2 + > .../g++.target/aarch64/fp8_mangling.C | 44 ++ > .../aarch64/fp8_scalar_typecheck_2.C | 381 ++ > .../gcc.target/aarch64/fp8_scalar_1.c | 134 ++ > .../aarch64/fp8_scalar_typecheck_1.c | 356 > 9 files changed, 996 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/g++.target/aarch64/fp8_mangling.C > create mode 100644 gcc/testsuite/g++.target/aarch64/fp8_scalar_typecheck_2.C > create mode 100644 gcc/testsuite/gcc.target/aarch64/fp8_scalar_1.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/fp8_scalar_typecheck_1.c > > diff --git a/gcc/config/aarch64/aarch64-builtins.cc > b/gcc/config/aarch64/aarch64-builtins.cc > index eb878b933fe..7d17df05a0f 100644 > --- a/gcc/config/aarch64/aarch64-builtins.cc > +++ b/gcc/config/aarch64/aarch64-builtins.cc > @@ -961,6 +961,11 @@ static GTY(()) tree aarch64_simd_intOI_type_node = > NULL_TREE; > static GTY(()) tree aarch64_simd_intCI_type_node = NULL_TREE; > static GTY(()) tree aarch64_simd_intXI_type_node = NULL_TREE; > > +/* The user-visible __mfp8 type, and a pointer to that type. Used > + across the back-end. */ > +tree aarch64_mfp8_type_node = NULL_TREE; > +tree aarch64_mfp8_ptr_type_node = NULL_TREE; > + > /* The user-visible __fp16 type, and a pointer to that type. Used >across the back-end. */ > tree aarch64_fp16_type_node = NULL_TREE; > @@ -1721,6 +1726,19 @@ aarch64_init_builtin_rsqrt (void) > } > } > > +/* Initialize the backend type that supports the user-visible __mfp8 > + type and its relative pointer type. */ > + > +stat
[Patch] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines
Hi all, in order to know and potentially re-use a specific offload device (reproducibility, affinity wise close to a CPU (socket), …) a mapping between an (universal?) unique identifier and the OpenMP device number is useful. Thus, TR13 added support for it. This is a collateral patch caused by looking at the API routines for other reasons and looking at that part of the spec during the OpenMP F2F. Besides the added API routines, the UID will be used elsewhere: * In context selectors: 'target_device' supports 'uid()'. * In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars. @Sandra: Besides the usual .texi part, for the 'target_device' trait set: if you add a new GOMP routine for kind/arch/isa - can you also add an UID argument such that we don't have to update the API when needing in the not so far future. @Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side (plugin + .texi)? @Jakub or anyone else — any comments, suggestions, remarks? [The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU and seems to work fine.] Tobias OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines Those TR13/OpenMP 6.0 routines permit a reproducible offloading to a specific device by mapping an OpenMP device number to a unique ID (UID). The GPU device UIDs should be universally unique, the one for the host is not. gcc/ChangeLog: * omp-general.cc (omp_runtime_api_procname): Add get_device_from_uid and omp_get_uid_from_device routines. include/ChangeLog: * cuda/cuda.h (cuDeviceGetUuid): Declare. (cuDeviceGetUuid_v2): Add prototype. libgomp/ChangeLog: * config/gcn/target.c (omp_get_uid_from_device, omp_get_device_from_uid): Add stub implementation. * config/nvptx/target.c (omp_get_uid_from_device, omp_get_device_from_uid): Likewise. * fortran.c (omp_get_uid_from_device_, omp_get_uid_from_device_8_): Add. * libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype. * libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'. * libgomp.map (GOMP_6.0): New, includind the new UID routines. * libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'. (Device Information Routines): Document new UID routines. (Offload-Target Specifics): Document UID format. * omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device): New prototype. * omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device): New interface. * omp_lib.h.in: Likewise. * plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New. * target.c (str_omp_initial_device): New static var. (STR_OMP_DEV_PREFIX): Define. (gomp_get_uid_for_device, omp_get_uid_from_device, omp_get_device_from_uid): New. (gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'. (gomp_target_init): Set the device's 'uid' field to NULL. * testsuite/libgomp.c/device_uid.c: New test. * testsuite/libgomp.fortran/device_uid.f90: New test. gcc/omp-general.cc | 4 +- include/cuda/cuda.h | 7 ++ libgomp/config/gcn/target.c | 14 libgomp/config/nvptx/target.c| 14 libgomp/fortran.c| 15 + libgomp/libgomp-plugin.h | 1 + libgomp/libgomp.h| 2 + libgomp/libgomp.map | 8 +++ libgomp/libgomp.texi | 81 +++- libgomp/omp.h.in | 3 + libgomp/omp_lib.f90.in | 23 +++ libgomp/omp_lib.h.in | 23 +++ libgomp/plugin/cuda-lib.def | 2 + libgomp/plugin/plugin-gcn.c | 16 + libgomp/plugin/plugin-nvptx.c| 34 ++ libgomp/target.c | 56 libgomp/testsuite/libgomp.c/device_uid.c | 38 +++ libgomp/testsuite/libgomp.fortran/device_uid.f90 | 42 18 files changed, 379 insertions(+), 4 deletions(-) diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index de91ba8a4a7..12788ad0249 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -3260,6 +3260,7 @@ omp_runtime_api_procname (const char *name) "alloc", "calloc", "free", + "get_device_from_uid", "get_interop_int", "get_interop_ptr", "get_mapped_ptr", @@ -3338,12 +3339,13 @@ omp_runtime_api_procname (const char *name) as DECL_NAME only omp_* and omp_*_8 appear. */ "display_env", "get_ancestor_thread_num", - "init_allocator", + "omp_get_uid_from_device", "get_partition_place_nums", "get_place_num_procs", "get_place_proc_ids", "get_schedule", "get_team_size", + "init_allocator",
Re: [PATCH] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860]
Pengxuan Zheng writes: > This is similar to the recent improvements to the Advanced SIMD popcount > expansion by using SVE. We can utilize SVE to generate more efficient code for > scalar mode popcount too. > > PR target/113860 > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md (popcount2): Update pattern to > also support V1DI mode. > * config/aarch64/aarch64.md (popcount2): Add TARGET_SVE support. > * config/aarch64/iterators.md (VDQHSD_V1DI): New mode iterator. > (SVE_VDQ_I): Add V1DI. > (bitsize): Likewise. > (VPRED): Likewise. > (VEC_POP_MODE): New mode attribute. > (vec_pop_mode): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/popcnt11.c: New test. Sorry for the slow review of this. The main reason for putting it off was the use of V1DI, which always makes me nervous. In particular: > @@ -2284,7 +2286,7 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI > "VNx8BI") >(VNx8DI "VNx2BI") (VNx8DF "VNx2BI") >(V8QI "VNx8BI") (V16QI "VNx16BI") >(V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI") > - (V4SI "VNx4BI") (V2DI "VNx2BI")]) > + (V4SI "VNx4BI") (V2DI "VNx2BI") (V1DI "VNx2BI")]) > it seems odd to have a predicate mode that contains more elements than the associated single-vector data mode. The patch also extends the non-SVE SIMD popcount pattern for V1DI, but it doesn't look like that path works. E.g. try the following with -march=armv8-a -fgimple -O2: __Uint64x1_t __GIMPLE foo (__Uint64x1_t x) { __Uint64x1_t z; z = .POPCOUNT (x); return z; } Thanks, Richard > ;; ...and again in lower case. > (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi") > @@ -2318,6 +2320,14 @@ (define_mode_attr VDOUBLE [(VNx16QI "VNx32QI") > (VNx4SI "VNx8SI") (VNx4SF "VNx8SF") > (VNx2DI "VNx4DI") (VNx2DF "VNx4DF")]) > > +;; The Advanced SIMD modes of popcount corresponding to scalar modes. > +(define_mode_attr VEC_POP_MODE [(QI "V8QI") (HI "V4HI") > + (SI "V2SI") (DI "V1DI")]) > + > +;; ...and again in lower case. > +(define_mode_attr vec_pop_mode [(QI "v8qi") (HI "v4hi") > + (SI "v2si") (DI "v1di")]) > + > ;; On AArch64 the By element instruction doesn't have a 2S variant. > ;; However because the instruction always selects a pair of values > ;; The normal 3SAME instruction can be used here instead. > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt11.c > b/gcc/testsuite/gcc.target/aarch64/popcnt11.c > new file mode 100644 > index 000..595b2f9eb93 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt11.c > @@ -0,0 +1,58 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -march=armv8.2-a+sve" } */ > +/* { dg-final { check-function-bodies "**" "" "" } } */ > + > +/* > +** f_qi: > +** ldr b([0-9]+), \[x0\] > +** cnt v\1.8b, v\1.8b > +** smovw0, v\1.b\[0\] > +** ret > +*/ > +unsigned > +f_qi (unsigned char *a) > +{ > + return __builtin_popcountg (a[0]); > +} > + > +/* > +** f_hi: > +** ldr h([0-9]+), \[x0\] > +** ptrue (p[0-7]).b, all > +** cnt z\1.h, \2/m, z\1.h > +** smovw0, v\1.h\[0\] > +** ret > +*/ > +unsigned > +f_hi (unsigned short *a) > +{ > + return __builtin_popcountg (a[0]); > +} > + > +/* > +** f_si: > +** ldr s([0-9]+), \[x0\] > +** ptrue (p[0-7]).b, all > +** cnt z\1.s, \2/m, z\1.s > +** umovx0, v\1.d\[0\] > +** ret > +*/ > +unsigned > +f_si (unsigned int *a) > +{ > + return __builtin_popcountg (a[0]); > +} > + > +/* > +** f_di: > +** ldr d([0-9]+), \[x0\] > +** ptrue (p[0-7])\.b, all > +** cnt z\1\.d, \2/m, z\1\.d > +** fmovx0, d\1 > +** ret > +*/ > +unsigned > +f_di (unsigned long *a) > +{ > + return __builtin_popcountg (a[0]); > +}
Re: [PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues
Martin Storsjö writes: > On Thu, 12 Sep 2024, Evgeny Karpov wrote: > >> The current binutils implementation does not support offset up to 4GB in >> IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB. >> This is related to differences in ELF and COFF relocation records. > > Yes, I agree. > > But I would not consider this a limitation of the binutils implementation, > this is a limitation of the object file format. It can't be worked around > by inventing your own custom relocations, but should instead worked around > on the code generation side, to avoid needing such large offsets. > > This approach is one such, quite valid. Another one is to generate extra > symbols to allow addressing anything with a smaller offset. Maybe this is my ELF bias showing, but: generating extra X=Y+OFF symbols isn't generally valid for ELF when Y is a global symbol, since interposition rules, comdat, weak symbols, and various other reasons, could mean that the local definition of Y isn't the one that gets used. Does COFF cope with that in some other way? If not, I would have expected that there would need to be a fallback path that didn't involve defining extra symbols. Thanks, Richard
Re: [PATCH] SVE intrinsics: Fold svmul with all-zero operands to zero vector
Jennifer Schmitz writes: >> On 18 Sep 2024, at 20:33, Richard Sandiford >> wrote: >> >> External email: Use caution opening links or attachments >> >> >> Jennifer Schmitz writes: >>> From 05e010a4ad5ef8df082b3e03b253aad85e2a270c Mon Sep 17 00:00:00 2001 >>> From: Jennifer Schmitz >>> Date: Tue, 17 Sep 2024 00:15:38 -0700 >>> Subject: [PATCH] SVE intrinsics: Fold svmul with all-zero operands to zero >>> vector >>> >>> As recently implemented for svdiv, this patch folds svmul to a zero >>> vector if one of the operands is a zero vector. This transformation is >>> applied if at least one of the following conditions is met: >>> - the first operand is all zeros or >>> - the second operand is all zeros, and the predicate is ptrue or the >>> predication is _x or _z. >>> >>> In contrast to constant folding, which was implemented in a previous >>> patch, this transformation is applied as soon as one of the operands is >>> a zero vector, while the other operand can be a variable. >>> >>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no >>> regression. >>> OK for mainline? >>> >>> Signed-off-by: Jennifer Schmitz >> >> OK, thanks. >> >> If you're planning any more work in this area, I think the next logical >> step would be to extend the current folds to all predication types, >> before going on to support other mul/div cases or other operations. >> >> In principle, the mul and div cases correspond to: >> >> if (integer_zerop (op1) || integer_zerop (op2)) >>return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs))); >> >> It would then be up to fold_active_lanes_to(X) to work out how to apply >> predication to X. The general case would be: >> >> - For x predication and unpredicated operations, fold to X. >> >> - For m and z, calculate a vector that supplies the values of inactive >>lanes (the first vector argument for m and a zero vector from z). >> >>- If X is equal to the inactive lanes vector, fold directly to X. >> >>- Otherwise fold to VEC_COND_EXPR > Dear Richard, > I pushed it to trunk with 08aba2dd8c9390b6131cca0aac069f97eeddc9d2. > Thank you also for the good suggestion, I will do that. During the last days, > I have been working on a patch that folds multiplication by powers of 2 to > left-shifts (svlsl), similar to for division. As I see it, that is > independent from what you proposed, because it is a change of the function > type. Can I submit it for review before starting on the patch you suggested? Sure! I agree the power-of-two fold is independent. I was just worried about building up technical debt if we added more fold-to-constant cases. Thanks, Richard
Re: [Fortran, Patch, PR106606, v1] Fortran: Break recursion building recursive types. [PR106606]
Hi Thomas, thanks for review. Committed with the changes requested as: gcc-15-3711-gde915fbe3cb Thanks again. Regards, Andre On Wed, 18 Sep 2024 18:24:19 +0200 Thomas Koenig wrote: > Hi Andre, > > > Regtested ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline? > > Extremely minor nit: In the commit message and ChangeLog entry, > > Build a derived type component's type only, when it is not already being > build and the component uses pointer semantics. > > I believe that should be "being built". > > In the ChangeLog entry > > derived types as component's types when they are not yet build. > > s/build/built/ > > OK for trunk. > > Thanks for the patch! > > Best regards > > Thomas > > -- Andre Vehreschild * Email: vehre ad gmx dot de
RE: [PATCH v5 4/4] RISC-V: Fix vector SAT_ADD dump check due to middle-end change
> So for the future I'd suggest you post those with a remark that you think > they're obvious and going to commit in a day (or some other reasonable > timeframe) if there are no complaints. Oh, I see. Thanks Robin for reminding. That would be perfect. Do you have any best practices for the remark "obvious"? Like [NFC] in subject to give some hit for not-function-change, maybe take [TBO] stand for to-be-obvious or something like that. Pan -Original Message- From: Robin Dapp Sent: Thursday, September 19, 2024 4:26 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin Dapp Subject: Re: [PATCH v5 4/4] RISC-V: Fix vector SAT_ADD dump check due to middle-end change > This patch would like fix the dump check times of vector SAT_ADD. The > middle-end change makes the match times from 2 to 4 times. > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. That's OK. And I think testsuite fixup patches like this you can consider "obvious" as long as you're sure the underlying reason is understood. In particular as you have been working in the saturating space for a while now. So for the future I'd suggest you post those with a remark that you think they're obvious and going to commit in a day (or some other reasonable timeframe) if there are no complaints. -- Regards Robin
Re: [Ping, Fortran, Patch, PR85002, v1] Fix deep-copy of alloc. comps. in coarrays ICEing and crashing w/ lib.
Hi Thomas, comitted as gcc-15-3707-g361903ad1af. Thanks for the review. I am reviewing your unsigned work at the moment. Thanks again and regards, Andre On Wed, 18 Sep 2024 18:18:20 +0200 Thomas Koenig wrote: > Am 18.09.24 um 12:31 schrieb Andre Vehreschild: > > Regtested ok on x86_64-pc-linux-gnu / F39. Ok for mainline? > > OK. > > Thanks for the patch! > > Best regards > > Thomas -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [gcc-wwwdocs PATCH] gcc-14: Mention -march=gracemont support in x86_64
On Thu, 19 Sep 2024, Haochen Jiang wrote: > When I was backporting my doc patch in gcc trunk today, I found when > adding -march=gracemont in GCC14, the corresponding wwwdoc is missing. > This patch is adding that. This looks fine, thank you. Gerald
Re: [PATCH v2 5/9] aarch64: Multiple adjustments to support the SMALL code model correctly
Evgeny Karpov writes: > LOCAL_LABEL_PREFIX has been changed to help the assembly > compiler recognize local labels. Emitting locals has been > replaced with the .lcomm directive to declare uninitialized > data without defining an exact section. Functions and objects > were missing declarations. Binutils was not able to distinguish > static from external, or an object from a function. > mingw_pe_declare_object_type has been added to have type > information for relocation on AArch64, which is not the case > for ix86. > > This fix relies on changes in binutils. > aarch64: Relocation fixes and LTO > https://sourceware.org/pipermail/binutils/2024-August/136481.html > > gcc/ChangeLog: > > * config/aarch64/aarch64-coff.h (LOCAL_LABEL_PREFIX): > Use "." as the local label prefix. > (ASM_OUTPUT_ALIGNED_LOCAL): Remove. > (ASM_OUTPUT_LOCAL): New. > * config/aarch64/cygming.h (ASM_OUTPUT_EXTERNAL_LIBCALL): > Update. > (ASM_DECLARE_OBJECT_NAME): New. > (ASM_DECLARE_FUNCTION_NAME): New. > * config/i386/cygming.h (ASM_DECLARE_COLD_FUNCTION_NAME): > Update. > (ASM_OUTPUT_EXTERNAL_LIBCALL): Update. > * config/mingw/winnt.cc (mingw_pe_declare_function_type): > Rename into ... > (mingw_pe_declare_type): ... this. > (i386_pe_start_function): Update. > * config/mingw/winnt.h (mingw_pe_declare_function_type): > Rename into ... > (mingw_pe_declare_type): ... this. > --- > gcc/config/aarch64/aarch64-coff.h | 22 ++ > gcc/config/aarch64/cygming.h | 18 +- > gcc/config/i386/cygming.h | 8 > gcc/config/mingw/winnt.cc | 18 +- > gcc/config/mingw/winnt.h | 3 +-- > 5 files changed, 37 insertions(+), 32 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-coff.h > b/gcc/config/aarch64/aarch64-coff.h > index 81fd9954f75..17f346fe540 100644 > --- a/gcc/config/aarch64/aarch64-coff.h > +++ b/gcc/config/aarch64/aarch64-coff.h > @@ -20,9 +20,8 @@ > #ifndef GCC_AARCH64_COFF_H > #define GCC_AARCH64_COFF_H > > -#ifndef LOCAL_LABEL_PREFIX > -# define LOCAL_LABEL_PREFIX "" > -#endif > +#undef LOCAL_LABEL_PREFIX > +#define LOCAL_LABEL_PREFIX "." > > /* Using long long breaks -ansi and -std=c90, so these will need to be > made conditional for an LLP64 ABI. */ > @@ -54,19 +53,10 @@ > } > #endif > > -/* Output a local common block. /bin/as can't do this, so hack a > - `.space' into the bss segment. Note that this is *bad* practice, > - which is guaranteed NOT to work since it doesn't define STATIC > - COMMON space but merely STATIC BSS space. */ > -#ifndef ASM_OUTPUT_ALIGNED_LOCAL > -# define ASM_OUTPUT_ALIGNED_LOCAL(STREAM, NAME, SIZE, ALIGN) \ > -{ > \ > - switch_to_section (bss_section); > \ > - ASM_OUTPUT_ALIGN (STREAM, floor_log2 (ALIGN / BITS_PER_UNIT)); \ > - ASM_OUTPUT_LABEL (STREAM, NAME); > \ > - fprintf (STREAM, "\t.space\t%d\n", (int)(SIZE)); > \ > -} > -#endif > +#define ASM_OUTPUT_LOCAL(FILE, NAME, SIZE, ROUNDED) \ > +( fputs (".lcomm ", (FILE)), \ > + assemble_name ((FILE), (NAME)),\ > + fprintf ((FILE), ",%lu\n", (ROUNDED))) I'd expect this to be: "," HOST_WIDE_INT_PRINT_DEC "\n" rather than ",%lu\n". "long" generally shouldn't be used in GCC code, since it's such an ambiguous type. LGTM otherwise. Thanks, Richard
Re: [PATCH] libcpp: Add -Wtrailing-blanks warning
On Thu, Sep 19, 2024 at 09:07:06AM +0200, Jakub Jelinek wrote: > space is ' ' '\t' '\n' '\r' '\f' '\v' in the C locale, > blank is ' ' '\t' > cntrl is a lot of chars but not ' ' > if we extend by the safe-ctype > vspace '\r' '\n' > nvspace ' ' '\t' '\f' '\v' '\0' > Obviously, we shouldn't look at '\r' and '\n', those aren't trailing > characters, those are line separators. > > Would we need to consider all UTF-8 (or EBCDIC-UTF) control characters is > cntrl? > ..0009; Control # Cc [10] .. > 000B..000C; Control # Cc [2] .. > 000E..001F; Control # Cc [18] .. > 007F..009F; Control # Cc [33] .. > 00AD ; Control # Cf SOFT HYPHEN > 061C ; Control # Cf ARABIC LETTER MARK > 180E ; Control # Cf MONGOLIAN VOWEL SEPARATOR > 200B ; Control # Cf ZERO WIDTH SPACE > 200E..200F; Control # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK > 2028 ; Control # Zl LINE SEPARATOR > 2029 ; Control # Zp PARAGRAPH SEPARATOR > 202A..202E; Control # Cf [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT > OVERRIDE > 2060..2064; Control # Cf [5] WORD JOINER..INVISIBLE PLUS > 2065 ; Control # Cn > 2066..206F; Control # Cf [10] LEFT-TO-RIGHT ISOLATE..NOMINAL DIGIT SHAPES > FEFF ; Control # Cf ZERO WIDTH NO-BREAK SPACE > FFF0..FFF8; Control # Cn [9] .. > FFF9..FFFB; Control # Cf [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR > ANNOTATION TERMINATOR > 13430..1343F ; Control # Cf [16] EGYPTIAN HIEROGLYPH VERTICAL > JOINER..EGYPTIAN HIEROGLYPH END WALLED ENCLOSURE > 1BCA0..1BCA3 ; Control # Cf [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND > FORMAT UP STEP > 1D173..1D17A ; Control # Cf [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL > END PHRASE > E ; Control # Cn > E0001 ; Control # Cf LANGUAGE TAG > E0002..E001F ; Control # Cn [30] .. > E0080..E00FF ; Control # Cn [128] .. > E01F0..E0FFF ; Control # Cn [3600] .. > > Wonder why anybody would be interested to find just trailing spaces and not > trailing tabs or vice versa, so if we have categories, blank would be one, > then perhaps nvspace as something not including '\0', so just ' ' '\t' '\f' > '\v' and if really needed, control characters with added ' ', but how to > call that and would it really need to parse UTF-8/EBCDIC and look at > pregenerated tables? And there are also: 0009..000D; White_Space # Cc [5] .. 0020 ; White_Space # Zs SPACE 0085 ; White_Space # Cc 00A0 ; White_Space # Zs NO-BREAK SPACE 1680 ; White_Space # Zs OGHAM SPACE MARK 2000..200A; White_Space # Zs [11] EN QUAD..HAIR SPACE 2028 ; White_Space # Zl LINE SEPARATOR 2029 ; White_Space # Zp PARAGRAPH SEPARATOR 202F ; White_Space # Zs NARROW NO-BREAK SPACE 205F ; White_Space # Zs MEDIUM MATHEMATICAL SPACE 3000 ; White_Space # Zs IDEOGRAPHIC SPACE Jakub
Re: [PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues
Evgeny Karpov writes: > The current binutils implementation does not support offset up to 4GB in > IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB. > This is related to differences in ELF and COFF relocation records. > There are ways to fix this. This work on relocation change will be extracted > to > a separate binutils patch series and discussion. > > To unblock the current patch series, the IMAGE_REL_ARM64_PAGEBASE_REL21 > relocation will remain unchanged, and the workaround below will be applied to > bypass the 1MB offset limitation. > > Regards, > Evgeny > > > The patch will be replaced by this change. Seems like a reasonable workarond to me FWIW, but some comments on the implementation below: > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 03362a975c0..5f17936df1f 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -2896,7 +2896,30 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, > if (can_create_pseudo_p ()) > tmp_reg = gen_reg_rtx (mode); > > - emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm))); > + do > + { > + if (TARGET_PECOFF) > + { > + poly_int64 offset; > + HOST_WIDE_INT const_offset; > + strip_offset (imm, &offset); > + > + if (offset.is_constant (&const_offset) > + && abs(const_offset) >= 1 << 20) abs_hwi (const_offset) (since const_offset has HOST_WIDE_INT type). > + { > + rtx const_int = imm; > + const_int = XEXP (const_int, 0); > + XEXP (const_int, 1) = GEN_INT(const_offset % (1 << 20)); CONST_INTs are shared objects, so we can't modify their value in-place. It might be easier to pass base and const_offset from the caller (aarch64_expand_mov_immediate). We are then guaranteed that the offset is constant and don't need to worry about the SVE case. The new SYM+OFF expression can be calculated using plus_constant. I think it'd be worth asserting that the offset fits in 32 bits, since if by some bug the offset is larger, we'd generate silent wrong code (in the sense that the compiler would truncate the offset before the assembler sees it). > + > + emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, > copy_rtx(imm))); > + emit_insn (gen_add_hioffset (tmp_reg, > GEN_INT(const_offset))); I think the normal addition patterns can handle this, if we pass the result of the ~0xf calculation. There should be no need for a dedicated pattern. > + break; > + } > + } > + > + emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm))); > + } while(0); I think it'd be clearer to duplicate the gen_add_losym and avoid the do...while(0) Thanks, Richard > + > emit_insn (gen_add_losym (dest, tmp_reg, imm)); > return; >} > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 665a333903c..072110f93e7 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -7405,6 +7405,13 @@ >DONE; > }) > > +(define_insn "add_hioffset" > + [(match_operand 0 "register_operand") > + (match_operand 1 "const_int_operand")] > + "" > + "add %0, %0, (%1 & ~0xf) >> 12, lsl #12" > +) > + > (define_insn "add_losym_" >[(set (match_operand:P 0 "register_operand" "=r") > (lo_sum:P (match_operand:P 1 "register_operand" "r")
Re: [patch, fortran] Add random numbers and fix some bugs.
Hi Thomas, submitting your patch as part of the mail got it corrupted by some mailer adding line breaks. It does not apply for me. Because I can't test it, I have more questions, see below: On Wed, 18 Sep 2024 22:22:15 +0200 Thomas Koenig wrote: > This patch adds random number support for UNSIGNED, plus fixes > two bugs, with array I/O where the type used to be set to BT_INTEGER, > and for division with the divisor being a constant. > > Again, depends on prevous submissions. > > OK for trunk? > > gcc/fortran/ChangeLog: > > * check.cc (gfc_check_random_number): Adjust for unsigned. > * iresolve.cc (gfc_resolve_random_number): Handle unsinged. Hihi, I do this typo, too, over and over again: s/unsinged/unsigned/ > * trans-expr.cc (gfc_conv_expr_op): Handle BT_UNSIGNED for divide. > * trans-types.cc (gfc_get_dtype_rank_type): Handle BT_UNSIGNED. > * gfortran.texi: Add RANDOM_NUMBER for UNSIGNED. > > diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc > index 533c9d7d343..1851cfb8d4a 100644 > --- a/gcc/fortran/check.cc > +++ b/gcc/fortran/check.cc > @@ -7007,8 +7007,14 @@ gfc_check_random_init (gfc_expr *repeatable, > gfc_expr *image_distinct) > bool > gfc_check_random_number (gfc_expr *harvest) > { > - if (!type_check (harvest, 0, BT_REAL)) > -return false; > + if (flag_unsigned) > +{ > + if (!type_check2 (harvest, 0, BT_REAL, BT_UNSIGNED)) > + return false; When the second argument is a BT_INTEGER, does this fail here? > +} > + else > +if (!type_check (harvest, 0, BT_REAL)) > + return false; > > if (!variable_check (harvest, 0, false)) > return false; Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [Patch] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines
Hi Tobias, in the changelog of libgomp: * fortran.c (omp_get_uid_from_device_, omp_get_uid_from_device_8_): Add. "Add." what? Can you be more specific, i.e. is it just a dummy or prototype? In the libgomp/libgomp.texi +@node omp_get_uid_from_device +@subsection @code{omp_get_uid_from_device} -- Obtain the unique id of a device +@table @asis +@item @emph{Description}: +This function returns a pointer to _a_ string that represents a unique identifier ^^^ +(UID) for the device specified by @var{device_num}. It returns a ... @@ -6604,6 +6673,12 @@ The implementation remark: @code{omp_thread_mem_alloc}, all use low-latency memory as first preference, and fall back to main graphics memory when the low-latency pool is exhausted. +@item The unique identifier (UID), used with OpenMP's API UID routine, consists + of the @samp{GPU-} prefix followed by the 16-bytes UUID as returned by + the CUDA runtime library. This UUID is output in grouped lower-case + hex digits; the grouping of those 32 digits is: 8 digits, hyphen, + 4 digits, hyphen, 4 digits, hyphen, 16 digits. The output matches the + format used by @code{nvidia-smi}. @end itemize Do I get this right, that for CUDA this is, e.g. GPU-0123456789abdcef ? Then why is the "normal" UUID display format described here? This confuses me. (Just curiosity.) Er, and when I read further on, I find the nvptx implementation and that contradicts the description. There a "normal" UUID is added to the GPU- id. So you might want to make that implementation remark more clear Sorry for the bickering. I just stumbled over that while waiting for a regression test. The remainder looks reasonable to me. Regards, Andre On Thu, 19 Sep 2024 15:23:54 +0200 Tobias Burnus wrote: > Hi all, > > in order to know and potentially re-use a specific offload device > (reproducibility, affinity wise close to a CPU (socket), …) a mapping between > an (universal?) unique identifier and the OpenMP device number is useful. > Thus, TR13 added support for it. > > This is a collateral patch caused by looking at the API routines for other > reasons and looking at that part of the spec during the OpenMP F2F. > > Besides the added API routines, the UID will be used elsewhere: > * In context selectors: 'target_device' supports 'uid()'. > * In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars. > > @Sandra: Besides the usual .texi part, for the 'target_device' trait set: > if you add a new GOMP routine for kind/arch/isa - can you also add an > UID argument such that we don't have to update the API when needing in the > not so far future. > > @Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side (plugin + > .texi)? > > @Jakub or anyone else — any comments, suggestions, remarks? > > [The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU > and seems to work fine.] > > Tobias -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH] RISC-V: testsuite: Fix SELECT_VL SLP fallout.
On 9/19/24 7:24 AM, Robin Dapp wrote: Hi, this fixes asm-scan fallout from r15-3712-g5e3a4a01785e2d where we allow SLP with SELECT_VL. Assisted by sed and regtested on rv64gcv_zvfh_zvbb. Rather lengthy but obvious, so going to commit after a while if the CI is happy. I think those tests don't really need to check for vsetvl anyway, not all of them at least but I didn't change that for now. Methodology works for me. jeff
[Fortran, Patch, PR84870, v1] Fix ICE and allocated memory not assigned correctly.
Hi all, in PR84870 an ICE was reported, that has been fixed in the meantime by some other patch. Nevertheless did a testcase reveal that the memory handling still was not correct. I.e. the test case in the patch was answering 2 for both x.b.a and y.b.a which is not correct. For a coarray all memory is allocated using an array descriptor. For scalars just a temporary descriptor is created and handed to the caf-register routine. The error here was, that the memory now handed back in the temporary descriptor was not used for the memory in the component, thus the pointer in the component was not updated. The patch fixes this. Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de From c26e97a8196fc26abf36a0bad6ffd6f9da7ba5d8 Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Thu, 19 Sep 2024 15:09:52 +0200 Subject: [PATCH] Fortran: Assign allocated caf-memory to scalar members [PR84870] Allocating a coarray required an array-descriptor. For scalars a temporary descriptor was created. Assigning the allocated memory from the temporary descriptor back to the scalar is now added. gcc/fortran/ChangeLog: PR fortran/84870 * trans-array.cc (duplicate_allocatable_coarray): For scalar allocatable components the memory allocated is now assigned to the component's pointer. gcc/testsuite/ChangeLog: * gfortran.dg/coarray/alloc_comp_10.f90: New test. --- gcc/fortran/trans-array.cc| 2 ++ .../gfortran.dg/coarray/alloc_comp_10.f90 | 24 +++ 2 files changed, 26 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/coarray/alloc_comp_10.f90 diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc index 838b6d3da80..3da7479fd10 100644 --- a/gcc/fortran/trans-array.cc +++ b/gcc/fortran/trans-array.cc @@ -9451,6 +9451,7 @@ duplicate_allocatable_coarray (tree dest, tree dest_tok, tree src, tree type, gfc_build_addr_expr (NULL_TREE, dest_tok), NULL_TREE, NULL_TREE, NULL_TREE, GFC_CAF_COARRAY_ALLOC_REGISTER_ONLY); + gfc_add_modify (&block, dest, gfc_conv_descriptor_data_get (dummy_desc)); null_data = gfc_finish_block (&block); gfc_init_block (&block); @@ -9460,6 +9461,7 @@ duplicate_allocatable_coarray (tree dest, tree dest_tok, tree src, tree type, gfc_build_addr_expr (NULL_TREE, dest_tok), NULL_TREE, NULL_TREE, NULL_TREE, GFC_CAF_COARRAY_ALLOC); + gfc_add_modify (&block, dest, gfc_conv_descriptor_data_get (dummy_desc)); tmp = builtin_decl_explicit (BUILT_IN_MEMCPY); tmp = build_call_expr_loc (input_location, tmp, 3, dest, src, diff --git a/gcc/testsuite/gfortran.dg/coarray/alloc_comp_10.f90 b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_10.f90 new file mode 100644 index 000..a31d005498c --- /dev/null +++ b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_10.f90 @@ -0,0 +1,24 @@ +!{ dg-do run } + +! Check that copying of memory for allocated scalar is assigned +! to coarray object. + +! Contributed by G. Steinmetz + +program p + type t +integer, allocatable :: a + end type + type t2 +type(t), allocatable :: b + end type + type(t2) :: x, y[*] + + x%b = t(1) + y = x + y%b%a = 2 + + if (x%b%a /= 1) stop 1 + if (y%b%a /= 2) stop 2 +end + -- 2.46.0
[PATCH] s390: Remove -m{,no-}lra option
I have been missing the two test cases and removed them since they depend on -mno-lra. -- 8< -- Since the old reload pass is about to be removed and we defaulted to LRA for over a decade, remove option -m{,no-}lra. PR target/113953 gcc/ChangeLog: * config/s390/s390.cc (s390_lra_p): Remove. (TARGET_LRA_P): Remove. * config/s390/s390.opt (mlra): Remove. * config/s390/s390.opt.urls (mlra): Remove. gcc/testsuite/ChangeLog: * gcc.target/s390/TI-constants-nolra.c: Removed. * gcc.target/s390/pr79895.c: Removed. --- gcc/config/s390/s390.cc | 10 gcc/config/s390/s390.opt | 4 -- gcc/config/s390/s390.opt.urls | 2 - .../gcc.target/s390/TI-constants-nolra.c | 47 --- gcc/testsuite/gcc.target/s390/pr79895.c | 9 5 files changed, 72 deletions(-) delete mode 100644 gcc/testsuite/gcc.target/s390/TI-constants-nolra.c delete mode 100644 gcc/testsuite/gcc.target/s390/pr79895.c diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index c9172d1153a..25d43ae3e13 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -11342,13 +11342,6 @@ s390_can_change_mode_class (machine_mode from_mode, return true; } -/* Return true if we use LRA instead of reload pass. */ -static bool -s390_lra_p (void) -{ - return s390_lra_flag; -} - /* Return true if register FROM can be eliminated via register TO. */ static bool @@ -18444,9 +18437,6 @@ s390_c_mode_for_floating_type (enum tree_index ti) #undef TARGET_LEGITIMATE_CONSTANT_P #define TARGET_LEGITIMATE_CONSTANT_P s390_legitimate_constant_p -#undef TARGET_LRA_P -#define TARGET_LRA_P s390_lra_p - #undef TARGET_CAN_ELIMINATE #define TARGET_CAN_ELIMINATE s390_can_eliminate diff --git a/gcc/config/s390/s390.opt b/gcc/config/s390/s390.opt index a5b5aa95a12..23ea4b8232d 100644 --- a/gcc/config/s390/s390.opt +++ b/gcc/config/s390/s390.opt @@ -229,10 +229,6 @@ Set the branch costs for conditional branch instructions. Reasonable values are small, non-negative integers. The default branch cost is 1. -mlra -Target Var(s390_lra_flag) Init(1) Save -Use LRA instead of reload. - mpic-data-is-text-relative Target Var(s390_pic_data_is_text_relative) Init(TARGET_DEFAULT_PIC_DATA_IS_TEXT_RELATIVE) Assume data segments are relative to text segment. diff --git a/gcc/config/s390/s390.opt.urls b/gcc/config/s390/s390.opt.urls index ab1e761efa8..bc772d2ffc7 100644 --- a/gcc/config/s390/s390.opt.urls +++ b/gcc/config/s390/s390.opt.urls @@ -74,8 +74,6 @@ UrlSuffix(gcc/S_002f390-and-zSeries-Options.html#index-mzarch) ; skipping UrlSuffix for 'mbranch-cost=' due to finding no URLs -; skipping UrlSuffix for 'mlra' due to finding no URLs - ; skipping UrlSuffix for 'mpic-data-is-text-relative' due to finding no URLs ; skipping UrlSuffix for 'mindirect-branch=' due to finding no URLs diff --git a/gcc/testsuite/gcc.target/s390/TI-constants-nolra.c b/gcc/testsuite/gcc.target/s390/TI-constants-nolra.c deleted file mode 100644 index b9948fc4aa5..000 --- a/gcc/testsuite/gcc.target/s390/TI-constants-nolra.c +++ /dev/null @@ -1,47 +0,0 @@ -/* { dg-do compile { target int128 } } */ -/* { dg-options "-O3 -mno-lra" } */ - -/* 2x lghi */ -__int128 a() { - return 0; -} - -/* 2x lghi */ -__int128 b() { - return -1; -} - -/* 2x lghi */ -__int128 c() { - return -2; -} - -/* lghi + llilh */ -__int128 d() { - return 16000 << 16; -} - -/* lghi + llihf */ -__int128 e() { - return (unsigned long long)8 << 32; -} - -/* lghi + llihf */ -__int128 f() { - return (unsigned __int128)8 << 96; -} - -/* llihf + llihf - this is handled via movti_bigconst pattern */ -__int128 g() { - return ((unsigned __int128)8 << 96) | ((unsigned __int128)8 << 32); -} - -/* Literal pool */ -__int128 h() { - return ((unsigned __int128)8 << 32) | 1; -} - -/* Literal pool */ -__int128 i() { - return (((unsigned __int128)8 << 32) | 1) << 64; -} diff --git a/gcc/testsuite/gcc.target/s390/pr79895.c b/gcc/testsuite/gcc.target/s390/pr79895.c deleted file mode 100644 index 02374e4b8a8..000 --- a/gcc/testsuite/gcc.target/s390/pr79895.c +++ /dev/null @@ -1,9 +0,0 @@ -/* { dg-do compile { target int128 } } */ -/* { dg-options "-O1 -mno-lra" } */ - -unsigned __int128 g; -void -foo () -{ - g = (unsigned __int128)1 << 127; -} -- 2.45.2
[PATCH] s390: Remove -m{,no-}lra option
Since the old reload pass is about to be removed and we defaulted to LRA for over a decade, remove option -m{,no-}lra. PR target/113953 gcc/ChangeLog: * config/s390/s390.cc (s390_lra_p): Remove. (TARGET_LRA_P): Remove. * config/s390/s390.opt (mlra): Remove. * config/s390/s390.opt.urls (mlra): Remove. --- Assuming that bootstrap and regtest (which are still running) finish successful, ok for mainline? gcc/config/s390/s390.cc | 10 -- gcc/config/s390/s390.opt | 4 gcc/config/s390/s390.opt.urls | 2 -- 3 files changed, 16 deletions(-) diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index c9172d1153a..25d43ae3e13 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -11342,13 +11342,6 @@ s390_can_change_mode_class (machine_mode from_mode, return true; } -/* Return true if we use LRA instead of reload pass. */ -static bool -s390_lra_p (void) -{ - return s390_lra_flag; -} - /* Return true if register FROM can be eliminated via register TO. */ static bool @@ -18444,9 +18437,6 @@ s390_c_mode_for_floating_type (enum tree_index ti) #undef TARGET_LEGITIMATE_CONSTANT_P #define TARGET_LEGITIMATE_CONSTANT_P s390_legitimate_constant_p -#undef TARGET_LRA_P -#define TARGET_LRA_P s390_lra_p - #undef TARGET_CAN_ELIMINATE #define TARGET_CAN_ELIMINATE s390_can_eliminate diff --git a/gcc/config/s390/s390.opt b/gcc/config/s390/s390.opt index a5b5aa95a12..23ea4b8232d 100644 --- a/gcc/config/s390/s390.opt +++ b/gcc/config/s390/s390.opt @@ -229,10 +229,6 @@ Set the branch costs for conditional branch instructions. Reasonable values are small, non-negative integers. The default branch cost is 1. -mlra -Target Var(s390_lra_flag) Init(1) Save -Use LRA instead of reload. - mpic-data-is-text-relative Target Var(s390_pic_data_is_text_relative) Init(TARGET_DEFAULT_PIC_DATA_IS_TEXT_RELATIVE) Assume data segments are relative to text segment. diff --git a/gcc/config/s390/s390.opt.urls b/gcc/config/s390/s390.opt.urls index ab1e761efa8..bc772d2ffc7 100644 --- a/gcc/config/s390/s390.opt.urls +++ b/gcc/config/s390/s390.opt.urls @@ -74,8 +74,6 @@ UrlSuffix(gcc/S_002f390-and-zSeries-Options.html#index-mzarch) ; skipping UrlSuffix for 'mbranch-cost=' due to finding no URLs -; skipping UrlSuffix for 'mlra' due to finding no URLs - ; skipping UrlSuffix for 'mpic-data-is-text-relative' due to finding no URLs ; skipping UrlSuffix for 'mindirect-branch=' due to finding no URLs -- 2.45.2
Re: [patch, fortran] Implement IANY, IALL and IPARITY for unsigned
Hi Thomas, this look fine to. Ok for trunk. Thanks for the patch, Andre On Wed, 18 Sep 2024 22:20:44 +0200 Thomas Koenig wrote: > OK for trunk? > > This is based on the previous submissions. Again, this does not > generate a new library version; rather it re-uses the signed > integer version already present in the library. > > OK for trunk? > > Previous submissions (without which this will not work): > > https://gcc.gnu.org/pipermail/fortran/2024-September/060975.html > https://gcc.gnu.org/pipermail/fortran/2024-September/060987.html > > gcc/fortran/ChangeLog: > > * check.cc (gfc_check_transf_bit_intrins): Handle unsigned. > * gfortran.texi: Docment IANY, IALL and IPARITY for unsigned. > * iresolve.cc (gfc_resolve_iall): Set flag to use integer > if type is BT_UNSIGNED. > (gfc_resolve_iany): Likewise. > (gfc_resolve_iparity): Likewise. > * simplify.cc (do_bit_and): Adjust asserts for BT_UNSIGNED. > (do_bit_ior): Likewise. > (do_bit_xor): Likewise > > gcc/testsuite/ChangeLog: > > * gfortran.dg/unsigned_29.f90: New test. > > gcc/fortran/check.cc | 14 ++- > gcc/fortran/gfortran.texi | 1 + > gcc/fortran/iresolve.cc | 6 +-- > gcc/fortran/simplify.cc | 51 +++ > gcc/testsuite/gfortran.dg/unsigned_29.f90 | 40 ++ > 5 files changed, 99 insertions(+), 13 deletions(-) > create mode 100644 gcc/testsuite/gfortran.dg/unsigned_29.f90 > > diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc > index 7c630dd73f4..533c9d7d343 100644 > --- a/gcc/fortran/check.cc > +++ b/gcc/fortran/check.cc > @@ -4430,7 +4430,19 @@ gfc_check_mask (gfc_expr *i, gfc_expr *kind) > bool > gfc_check_transf_bit_intrins (gfc_actual_arglist *ap) > { > - if (ap->expr->ts.type != BT_INTEGER) > + bt type = ap->expr->ts.type; > + > + if (flag_unsigned) > +{ > + if (type != BT_INTEGER && type != BT_UNSIGNED) > + { > + gfc_error ("%qs argument of %qs intrinsic at %L must be INTEGER " > + "or UNSIGNED", gfc_current_intrinsic_arg[0]->name, > + gfc_current_intrinsic, &ap->expr->where); > + return false; > + } > +} > + else if (ap->expr->ts.type != BT_INTEGER) > { > gfc_error ("%qs argument of %qs intrinsic at %L must be INTEGER", >gfc_current_intrinsic_arg[0]->name, > diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi > index e5ffe67..3eb8039c09f 100644 > --- a/gcc/fortran/gfortran.texi > +++ b/gcc/fortran/gfortran.texi > @@ -2789,6 +2789,7 @@ As of now, the following intrinsics take unsigned > arguments: > @item @code{RANGE} > @item @code{TRANSFER} > @item @code{SUM}, @code{PRODUCT}, @code{MATMUL} and @code{DOT_PRODUCT} > +@item @code{IANY}, @code{IALL} and @code{IPARITY} > @end itemize > This list will grow in the near future. > @c - > diff --git a/gcc/fortran/iresolve.cc b/gcc/fortran/iresolve.cc > index 92a591cf6d7..58a1821ef10 100644 > --- a/gcc/fortran/iresolve.cc > +++ b/gcc/fortran/iresolve.cc > @@ -1195,7 +1195,7 @@ gfc_resolve_hypot (gfc_expr *f, gfc_expr *x, > gfc_expr *y ATTRIBUTE_UNUSED) > void > gfc_resolve_iall (gfc_expr *f, gfc_expr *array, gfc_expr *dim, > gfc_expr *mask) > { > - resolve_transformational ("iall", f, array, dim, mask); > + resolve_transformational ("iall", f, array, dim, mask, true); > } > > > @@ -1223,7 +1223,7 @@ gfc_resolve_iand (gfc_expr *f, gfc_expr *i, > gfc_expr *j) > void > gfc_resolve_iany (gfc_expr *f, gfc_expr *array, gfc_expr *dim, > gfc_expr *mask) > { > - resolve_transformational ("iany", f, array, dim, mask); > + resolve_transformational ("iany", f, array, dim, mask, true); > } > > > @@ -1429,7 +1429,7 @@ gfc_resolve_long (gfc_expr *f, gfc_expr *a) > void > gfc_resolve_iparity (gfc_expr *f, gfc_expr *array, gfc_expr *dim, > gfc_expr *mask) > { > - resolve_transformational ("iparity", f, array, dim, mask); > + resolve_transformational ("iparity", f, array, dim, mask, true); > } > > > diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc > index e5681c42a48..bd2f6485c95 100644 > --- a/gcc/fortran/simplify.cc > +++ b/gcc/fortran/simplify.cc > @@ -3401,9 +3401,20 @@ gfc_simplify_iachar (gfc_expr *e, gfc_expr *kind) > static gfc_expr * > do_bit_and (gfc_expr *result, gfc_expr *e) > { > - gcc_assert (e->ts.type == BT_INTEGER && e->expr_type == EXPR_CONSTANT); > - gcc_assert (result->ts.type == BT_INTEGER > - && result->expr_type == EXPR_CONSTANT); > + if (flag_unsigned) > +{ > + gcc_assert ((e->ts.type == BT_INTEGER || e->ts.type == BT_UNSIGNED) > + && e->expr_type == EXPR_CONSTANT); > + gcc_assert ((result->ts.type == BT_INTEGER > +|| result->ts.type == BT_UNSIGNED) > + && result->expr_type == E
Re: [Patch, Fortran] Implement Unsigned for SUM and PRODUCT
Hi Thomas, thanks for the patch. I have one proposal/question and one missing verb (IMO). Else the patch looks fine to me. Ok for trunk. > diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi > index 829ab00c665..e5ffe67 100644 > --- a/gcc/fortran/gfortran.texi > +++ b/gcc/fortran/gfortran.texi > @@ -2788,7 +2788,7 @@ As of now, the following intrinsics take unsigned > arguments: @item @code{MVBITS} > @item @code{RANGE} > @item @code{TRANSFER} > -@item @code{MATMUL} and @code{DOT_PRODUCT} > +@item @code{SUM}, @code{PRODUCT}, @code{MATMUL} and @code{DOT_PRODUCT} How about sorting those alphabetically and putting each on a separate line? This might make it more viewable. Just a suggestion. > @end itemize > This list will grow in the near future. > @c - > diff --git a/gcc/fortran/iresolve.cc b/gcc/fortran/iresolve.cc > index 32b31432e58..92a591cf6d7 100644 > --- a/gcc/fortran/iresolve.cc > +++ b/gcc/fortran/iresolve.cc > @@ -175,9 +175,11 @@ resolve_bound (gfc_expr *f, gfc_expr *array, gfc_expr > *dim, gfc_expr *kind, > static void > resolve_transformational (const char *name, gfc_expr *f, gfc_expr *array, > - gfc_expr *dim, gfc_expr *mask) > + gfc_expr *dim, gfc_expr *mask, > + bool use_integer = false) > { >const char *prefix; > + bt type; > >f->ts = array->ts; > > @@ -200,9 +202,18 @@ resolve_transformational (const char *name, gfc_expr *f, > gfc_expr *array, gfc_resolve_dim_arg (dim); > } > > + /* For those intrinsic like SUM where we the integer version There is a verb missing here, IMO. ... where we _use_ the ... ??? > + actually uses unsigned, but we call it as the integer > + version. */ > + > + if (use_integer && array->ts.type == BT_UNSIGNED) > +type = BT_INTEGER; > + else > +type = array->ts.type; > + >f->value.function.name > = gfc_get_string (PREFIX ("%s%s_%c%d"), prefix, name, > - gfc_type_letter (array->ts.type), > + gfc_type_letter (type), > gfc_type_abi_kind (&array->ts)); > } > Regards and thanks for the patch, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues
Richard Sandiford writes: > Evgeny Karpov writes: >> + { >> + rtx const_int = imm; >> + const_int = XEXP (const_int, 0); >> + XEXP (const_int, 1) = GEN_INT(const_offset % (1 << 20)); > > CONST_INTs are shared objects, so we can't modify their value in-place. Gah, sorry, I misread. The patch was only modifying the PLUS, which should be valid. My comment below still stands though. > It might be easier to pass base and const_offset from the caller > (aarch64_expand_mov_immediate). We are then guaranteed that the > offset is constant and don't need to worry about the SVE case. > The new SYM+OFF expression can be calculated using plus_constant. > > I think it'd be worth asserting that the offset fits in 32 bits, > since if by some bug the offset is larger, we'd generate silent > wrong code (in the sense that the compiler would truncate the offset > before the assembler sees it).
Re: [PATCH] SVE intrinsics: Fold svmul with all-zero operands to zero vector
> On 18 Sep 2024, at 20:33, Richard Sandiford wrote: > > External email: Use caution opening links or attachments > > > Jennifer Schmitz writes: >> From 05e010a4ad5ef8df082b3e03b253aad85e2a270c Mon Sep 17 00:00:00 2001 >> From: Jennifer Schmitz >> Date: Tue, 17 Sep 2024 00:15:38 -0700 >> Subject: [PATCH] SVE intrinsics: Fold svmul with all-zero operands to zero >> vector >> >> As recently implemented for svdiv, this patch folds svmul to a zero >> vector if one of the operands is a zero vector. This transformation is >> applied if at least one of the following conditions is met: >> - the first operand is all zeros or >> - the second operand is all zeros, and the predicate is ptrue or the >> predication is _x or _z. >> >> In contrast to constant folding, which was implemented in a previous >> patch, this transformation is applied as soon as one of the operands is >> a zero vector, while the other operand can be a variable. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> OK for mainline? >> >> Signed-off-by: Jennifer Schmitz > > OK, thanks. > > If you're planning any more work in this area, I think the next logical > step would be to extend the current folds to all predication types, > before going on to support other mul/div cases or other operations. > > In principle, the mul and div cases correspond to: > > if (integer_zerop (op1) || integer_zerop (op2)) >return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs))); > > It would then be up to fold_active_lanes_to(X) to work out how to apply > predication to X. The general case would be: > > - For x predication and unpredicated operations, fold to X. > > - For m and z, calculate a vector that supplies the values of inactive >lanes (the first vector argument for m and a zero vector from z). > >- If X is equal to the inactive lanes vector, fold directly to X. > >- Otherwise fold to VEC_COND_EXPR Dear Richard, I pushed it to trunk with 08aba2dd8c9390b6131cca0aac069f97eeddc9d2. Thank you also for the good suggestion, I will do that. During the last days, I have been working on a patch that folds multiplication by powers of 2 to left-shifts (svlsl), similar to for division. As I see it, that is independent from what you proposed, because it is a change of the function type. Can I submit it for review before starting on the patch you suggested? Best, Jennifer > > Richard smime.p7s Description: S/MIME cryptographic signature
[PATCH] Always dump generated distance vectors
There's special-casing for equal access functions which bypasses printing the distance vectors. The following makes sure we print them always which helps debugging. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. * tree-data-ref.cc (build_classic_dist_vector): Move distance vector dumping to single caller ... (subscript_dependence_tester): ... here, dumping always when we succeed computing it. --- gcc/tree-data-ref.cc | 34 ++ 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc index 26e6d9a5657..0f173e8803a 100644 --- a/gcc/tree-data-ref.cc +++ b/gcc/tree-data-ref.cc @@ -5547,21 +5547,6 @@ build_classic_dist_vector (struct data_dependence_relation *ddr, DDR_NB_LOOPS (ddr), 0)); } - if (dump_file && (dump_flags & TDF_DETAILS)) -{ - unsigned i; - - fprintf (dump_file, "(build_classic_dist_vector\n"); - for (i = 0; i < DDR_NUM_DIST_VECTS (ddr); i++) - { - fprintf (dump_file, " dist_vector = ("); - print_lambda_vector (dump_file, DDR_DIST_VECT (ddr, i), - DDR_NB_LOOPS (ddr)); - fprintf (dump_file, " )\n"); - } - fprintf (dump_file, ")\n"); -} - return true; } @@ -5673,7 +5658,24 @@ subscript_dependence_tester (struct data_dependence_relation *ddr, compute_subscript_distance (ddr); if (build_classic_dist_vector (ddr, loop_nest)) -build_classic_dir_vector (ddr); +{ + if (dump_file && (dump_flags & TDF_DETAILS)) + { + unsigned i; + + fprintf (dump_file, "(build_classic_dist_vector\n"); + for (i = 0; i < DDR_NUM_DIST_VECTS (ddr); i++) + { + fprintf (dump_file, " dist_vector = ("); + print_lambda_vector (dump_file, DDR_DIST_VECT (ddr, i), + DDR_NB_LOOPS (ddr)); + fprintf (dump_file, " )\n"); + } + fprintf (dump_file, ")\n"); + } + + build_classic_dir_vector (ddr); +} } /* Returns true when all the access functions of A are affine or -- 2.43.0
Re: [PATCH][v2] tree-optimization/116573 - .SELECT_VL for SLP
On Thu, 19 Sep 2024, Robin Dapp wrote: > > On Tue, 17 Sep 2024, Richard Biener wrote: > > > > > The following restores the use of .SELECT_VL for testcases where it > > > is safe to use even when using SLP. I've for now restricted it > > > to single-lane SLP plus optimistically allow store-lane nodes > > > and assume single-lane roots are not widened but at most to > > > load-lane who should be fine. > > > > > > v2 fixes latent issues in vectorizable_load/store. > > > > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > > > So while this fixes the earlier observed 80 regressions from not using > > SLP this now introduces many more from the CI (800), all in other > > scan-assembler tests where after checking a sample of one (sic!) > > we seem to use .SELECT_VL more now but expect not to. Unfortunately > > none of the affected tests are runtime tests but at least for the > > single test I investigated there is nothing wrong with using .SELECT_VL. > > > > I've checked the full CI results and as far I can see there are no > > execute fails caused by this patch (I have locally done a full > > check-gcc as well with a similar result). > > > > So I'm asking for explicit approval here. > > > > OK for trunk? > > Odd. With my testing, rv64 only though, I haven't observed any > additional fallout. But the CI knows better, usually. > > While I worked on my patch (which ended up looking similar to yours) I also > noticed that some examples now use SELECT_VL where we didn't before, and, > they appeared reasonable to me. Definitely saw no execution failures either. > > So, I'd say let's go ahead. Once it is in we can deal with the fallout. > Same as the LOAD_LANES fallout that I wanted to take care of as soon as > our internal matters permit. > Thanks for fixing it. r15-3712-g5e3a4a01785e2d Richard.
Re: [PATCH v11] ada: fix timeval timespec on 32 bits archs with 64 bits time_t [PR114065]
Nicolas Boulenguez writes: > PR ada/114065 > > Hello. > Any news about these patches? Hello, Sorry about the delay. Arnaud already replied on BZ, but I'll add a few remarks. In 0001-Ada-merge-all-timeval-and-timespec-definitions-and-c.patch: > - -- C timeval represent a duration (used in Select for example). This > - -- structure is composed of a number of seconds and a number of micro > - -- seconds. The timeval structure is not exposed here because its > - -- definition is target dependent. Interface to C programs is done via a > - -- pointer to timeval structure. > + function To_Duration (T : not null access timeval) > +return System.C_Time.Non_Negative_Duration > + with Inline > + is (System.C_Time.To_Duration (T.all)); > + -- Deprecated. Please use C_Time directly. The aspect "with Inline" is incorrect, it should be last, after the return expression. The above does not build. The obvious fix would be: function To_Duration (T : not null access timeval) return System.C_Time.Non_Negative_Duration is (System.C_Time.To_Duration (T.all)) with Inline; In 0007-Ada-drop-unneeded-darwin-solaris-x32-variants-of-Sys.patch: > diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl > index 82d01b2..1f339f3 100644 > @@ -2619,7 +2619,7 @@ ifeq ($(strip $(filter-out %x32 linux%,$(target_cpu) > $(target_os))),) >s-mudido.adbs-osinte.adss-osinte.adb - s-osprim.adb + s-osprim.adbs-taprop.adbs-tasinf.adss-tasinf.adb @@ -2703,7 +2703,7 @@ ifeq ($(strip $(filter-out darwin%,$(target_os))),) >ifeq ($(strip $(filter-out %86,$(target_cpu))),) > LIBGNAT_TARGET_PAIRS += \ >s-intman.adb - s-osprim.adb + s-osprim.adb$(ATOMICS_TARGET_PAIRS) \ >system.ads > @@ -2722,7 +2722,7 @@ ifeq ($(strip $(filter-out darwin%,$(target_os))),) >ifeq ($(strip $(filter-out %x86_64,$(target_cpu))),) > LIBGNAT_TARGET_PAIRS += \ >s-intman.adb - s-osprim.adb + s-osprim.adba-exetim.adsa-exetim.adb$(ATOMICS_TARGET_PAIRS) \ > @@ -2769,7 +2769,7 @@ ifeq ($(strip $(filter-out darwin%,$(target_os))),) >ifeq ($(strip $(filter-out arm,$(target_cpu))),) > LIBGNAT_TARGET_PAIRS += \ >s-intman.adb - s-osprim.adb + s-osprim.adb$(ATOMICS_TARGET_PAIRS) \ >$(ATOMICS_BUILTINS_TARGET_PAIRS) > > @@ -2782,7 +2782,7 @@ ifeq ($(strip $(filter-out darwin%,$(target_os))),) >a-nallfl.adss-intman.adbs-dorepr.adb - s-osprim.adb + s-osprim.adb$(ATOMICS_TARGET_PAIRS) \ >$(ATOMICS_BUILTINS_TARGET_PAIRS) \ >$(GNATRTL_128BIT_PAIRS) > > diff --git a/gcc/ada/libgnat/s-osprim__rtems.adb > b/gcc/ada/libgnat/s-osprim__rtems.adb > index f7b607a..6116345 100644 > --- a/gcc/ada/libgnat/s-osprim__rtems.adb > +++ b/gcc/ada/libgnat/s-osprim__rtems.adb > @@ -29,7 +29,7 @@ > -- > -- > > -- > > --- This version is for POSIX-like operating systems > +-- This version is for POSIX-like operating systems, Darwin and Linux/x32. > > with System.C_Time; It may be surprising to have the RTEMS file used by other OS. The original comment should have mentionned that in the first place, but the file was only used with RTEMS. With your change, the file is effectively shared, so it would be best to rename it. Tests are running and I'll report any issue we may find. Thank you for your patience, Marc
Re: [PATCH v11] ada: fix timeval timespec on 32 bits archs with 64 bits time_t [PR114065]
> In 0001-Ada-merge-all-timeval-and-timespec-definitions-and-c.patch: > > > - -- C timeval represent a duration (used in Select for example). This > > - -- structure is composed of a number of seconds and a number of micro > > - -- seconds. The timeval structure is not exposed here because its > > - -- definition is target dependent. Interface to C programs is done via > > a > > - -- pointer to timeval structure. > > + function To_Duration (T : not null access timeval) > > +return System.C_Time.Non_Negative_Duration > > + with Inline > > + is (System.C_Time.To_Duration (T.all)); > > + -- Deprecated. Please use C_Time directly. > > The aspect "with Inline" is incorrect, it should be last, after the > return expression. The above does not build. > > The obvious fix would be: > >function To_Duration (T : not null access timeval) > return System.C_Time.Non_Negative_Duration >is (System.C_Time.To_Duration (T.all)) > with Inline; Note that expression functions are already marked inline-unless-impossible so you should simply drop the Inline aspect. > It may be surprising to have the RTEMS file used by other OS. The > original comment should have mentionned that in the first place, but the > file was only used with RTEMS. With your change, the file is effectively > shared, so it would be best to rename it. Agreed.
RE: [PATCH] RISC-V: testsuite: Fix SELECT_VL SLP fallout.
Thanks Robin. > I think those tests don't really need to check for vsetvl anyway. Looks only scan asm for RVV fixed-pointer insn is good enough for vector part, which is somehow different to scalar. I will make the change after this patch pushed. Pan -Original Message- From: Robin Dapp Sent: Thursday, September 19, 2024 9:25 PM To: gcc-patches Cc: pal...@dabbelt.com; kito.ch...@gmail.com; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; Li, Pan2 ; rdapp@gmail.com Subject: [PATCH] RISC-V: testsuite: Fix SELECT_VL SLP fallout. Hi, this fixes asm-scan fallout from r15-3712-g5e3a4a01785e2d where we allow SLP with SELECT_VL. Assisted by sed and regtested on rv64gcv_zvfh_zvbb. Rather lengthy but obvious, so going to commit after a while if the CI is happy. I think those tests don't really need to check for vsetvl anyway, not all of them at least but I didn't change that for now. Regards Robin gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: Expect length-controlled loop. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-25.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-26.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-27.c: Ditto.
Re: [PATCH RFC] build: update bootstrap req to C++14
On 9/19/24 7:57 AM, Richard Biener wrote: On Wed, Sep 18, 2024 at 6:22 PM Jason Merrill wrote: Tested x86_64-pc-linux-gnu with 5.5.0 bootstrap compiler. Thoughts? I'm fine with this in general - do we have needs of bumping the requirement for GCC 15 though? IMO we should bump once we are requiring actual C++14 in some place. Jakub's dwarf2asm patch yesterday uses C++14 if available, and I remember seeing a couple of other patches that would have been simpler with C++14 available. As of the version requirement as you say only some minor versions of the GCC 5 series are OK I would suggest to say we recommend using GCC 6 or later but GCC 5.5 should also work? Aren't we already specifying a minor revision with 4.8.3 for C++11? Another possibility would be to just say GCC 5, and adjust that upward if we run into problems. Jason
Re: [Patch] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines
Hi Andre, thanks for reading the patch + commenting. Andre Vehreschild wrote: in the changelog of libgomp: * fortran.c (omp_get_uid_from_device_, omp_get_uid_from_device_8_): Add. "Add." what? Can you be more specific, i.e. is it just a dummy or prototype? Neither. It is a full implementation (that is a wrapper to the target.c function, directly called by C/C++). The prototype used by fortran.c is 'omp.h.in' (i.e. the C/C++ header file, also used by user code) and for Fortran code of users, it is the module generated from 'omp_lib.f90.in' and the (deprecated) include file 'omp_lib.h.in'. The purpose of fortran.c in general – and also for the added code – is to be a wrapper between the Fortran API/ABI and the C ABI. In the current case, there are two reasons for the two functions: (a) The result type is 'character(:), pointer' – but the C function just returns a '\0' terminated const char*. Hence, the wrapper function contains a '*result_len = strlen (*result);' besides the '*result = ' (b) The argument is an 'integer'. As we want to be compatible with -fdefault-integer-8, previously somewhat fashionable, we have an 'int32_t' and an 'int64_t' version of the function – which needs a second wrapper function. As for the other API routine, as a BIND(C) makes it call the C function, no wrapper it needed. * * * [Typo: missing 'a' – noted + will fix.] * * * +@item The unique identifier (UID), used with OpenMP's API UID routine, consists + of the @samp{GPU-} prefix followed by the 16-bytes UUID as returned by + the CUDA runtime library. This UUID is output in grouped lower-case + hex digits; the grouping of those 32 digits is: 8 digits, hyphen, + 4 digits, hyphen, 4 digits, hyphen, 16 digits. The output matches the + format used by @code{nvidia-smi}. @end itemize Do I get this right, that for CUDA this is, e.g. GPU-0123456789abdcef ? Then why is the "normal" UUID display format described here? This confuses me. (Just curiosity.) For AMD, it is the following type of string, which contains a 8 bytes/16 hex-digits UUID part: 'GPU-abcef0123456789'. While for Nvidia it is 'GPU-abcdef12-1234-1234-01234567890abcd', consisting of a 16 bytes/32 hex-digits UUID. For AMD, we directly get the string, matching what "rocminfo" shows as UUID. For Nvidia, we don't get a string but a 'char bytes[16]' array filled with the values, which we print each as '%02x' hex digit. For the output, additionally, a "GPU-" prefix is added + a few hyphens. That's to mimic what 'nvidia-smi -a' outputs. I admit it is slightly confusing – and when reading the .texi, it is also easy to miss that one part talks about AMD ("GCN") GPUs and the other about NVidia GPUs. → https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html (In terms of OpenMP, it is only a unique identifier; it does not need to be universally unique [and also isn't for the host]; AMD and Nvidia call it UUID and it looks rather unique for the GPU; rocminfo also outputs an "UUID" for the CPU but that's just "CPU-XX" (twice for a dual socket system, i.e. not even unique), but we don't use this output.) Er, and when I read further on, I find the nvptx implementation and that contradicts the description. There a "normal" UUID is added to the GPU- id. Now I am confused. What description contradicts which one? Tobias
Re: [PATCH] c++: ICE with structured bindings and m-d array [PR102594]
Ping. On Thu, Sep 05, 2024 at 06:32:28PM -0400, Marek Polacek wrote: > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14? > > -- >8 -- > We ICE in decay_conversion with this test: > > struct S { > S() {} > }; > S arr[1][1]; > auto [m](arr3); > > But not when the last line is: > > auto [n] = arr3; > > Therefore the difference is between copy- and direct-init. In > particular, in build_vec_init we have: > > if (direct_init) > from = build_tree_list (NULL_TREE, from); > > and then we call build_vec_init again with init==from. Then > decay_conversion gets the TREE_LIST and it crashes. > > build_aggr_init has: > > /* Wrap the initializer in a CONSTRUCTOR so that build_vec_init > recognizes it as direct-initialization. */ > init = build_constructor_single (init_list_type_node, >NULL_TREE, init); > CONSTRUCTOR_IS_DIRECT_INIT (init) = true; > > so I propose to do the same in build_vec_init. > > PR c++/102594 > > gcc/cp/ChangeLog: > > * init.cc (build_vec_init): Build up a CONSTRUCTOR to signal > direct-initialization rather than a TREE_LIST. > > gcc/testsuite/ChangeLog: > > * g++.dg/cpp1z/decomp61.C: New test. > --- > gcc/cp/init.cc| 8 +++- > gcc/testsuite/g++.dg/cpp1z/decomp61.C | 28 +++ > 2 files changed, 35 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.dg/cpp1z/decomp61.C > > diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc > index be7fdb40dd6..f785015e477 100644 > --- a/gcc/cp/init.cc > +++ b/gcc/cp/init.cc > @@ -4958,7 +4958,13 @@ build_vec_init (tree base, tree maxindex, tree init, > if (xvalue) > from = move (from); > if (direct_init) > - from = build_tree_list (NULL_TREE, from); > + { > + /* Wrap the initializer in a CONSTRUCTOR so that > + build_vec_init recognizes it as direct-initialization. */ > + from = build_constructor_single (init_list_type_node, > +NULL_TREE, from); > + CONSTRUCTOR_IS_DIRECT_INIT (from) = true; > + } > } > else > from = NULL_TREE; > diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp61.C > b/gcc/testsuite/g++.dg/cpp1z/decomp61.C > new file mode 100644 > index 000..ad0a20c1add > --- /dev/null > +++ b/gcc/testsuite/g++.dg/cpp1z/decomp61.C > @@ -0,0 +1,28 @@ > +// PR c++/102594 > +// { dg-do compile { target c++17 } } > + > +struct S { > + S() {} > +}; > +S arr1[2]; > +S arr2[2][1]; > +S arr3[1][1]; > +auto [m](arr3); > +auto [n] = arr3; > + > +struct X { > + int i; > +}; > + > +void > +g (X x) > +{ > + auto [a, b](arr2); > + auto [c, d] = arr2; > + auto [e, f] = (arr2); > + auto [i, j](arr1); > + auto [k, l] = arr1; > + auto [m, n] = (arr1); > + auto [z] = x; > + auto [y](x); > +} > > base-commit: b567e5ead5d54f022c57b48f31653f6ae6ece007 > -- > 2.46.0 > Marek
[Patch][v2] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines
Minor update – addressing the issues that Andre raised (thanks!): 'Add.' → 'New functions.' in the ChangeLog for 'fortran.c' and otherwise libgomp.texi changes, only: A bunch of typo fixes (preexisting and in the new text). I also added an made-up example UUID for the GPUs, which should help to reduce confusion. Any additional comments or suggestions? Tobias Tobias Burnus wrote: in order to know and potentially re-use a specific offload device (reproducibility, affinity wise close to a CPU (socket), …) a mapping between an (universal?) unique identifier and the OpenMP device number is useful. Thus, TR13 added support for it. This is a collateral patch caused by looking at the API routines for other reasons and looking at that part of the spec during the OpenMP F2F. Besides the added API routines, the UID will be used elsewhere: * In context selectors: 'target_device' supports 'uid()'. * In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars. @Sandra: Besides the usual .texi part, for the 'target_device' trait set: if you add a new GOMP routine for kind/arch/isa - can you also add an UID argument such that we don't have to update the API when needing in the not so far future. @Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side (plugin + .texi)? @Jakub or anyone else — any comments, suggestions, remarks? [The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU and seems to work fine.]OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines Those TR13/OpenMP 6.0 routines permit a reproducible offloading to a specific device by mapping an OpenMP device number to a unique ID (UID). The GPU device UIDs should be universally unique, the one for the host is not. gcc/ChangeLog: * omp-general.cc (omp_runtime_api_procname): Add get_device_from_uid and omp_get_uid_from_device routines. include/ChangeLog: * cuda/cuda.h (cuDeviceGetUuid): Declare. (cuDeviceGetUuid_v2): Add prototype. libgomp/ChangeLog: * config/gcn/target.c (omp_get_uid_from_device, omp_get_device_from_uid): Add stub implementation. * config/nvptx/target.c (omp_get_uid_from_device, omp_get_device_from_uid): Likewise. * fortran.c (omp_get_uid_from_device_, omp_get_uid_from_device_8_): New functions. * libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype. * libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'. * libgomp.map (GOMP_6.0): New, includind the new UID routines. * libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'. (Device Information Routines): Document new UID routines. (Offload-Target Specifics): Document UID format. * omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device): New prototype. * omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device): New interface. * omp_lib.h.in: Likewise. * plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New. * target.c (str_omp_initial_device): New static var. (STR_OMP_DEV_PREFIX): Define. (gomp_get_uid_for_device, omp_get_uid_from_device, omp_get_device_from_uid): New. (gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'. (gomp_target_init): Set the device's 'uid' field to NULL. * testsuite/libgomp.c/device_uid.c: New test. * testsuite/libgomp.fortran/device_uid.f90: New test. gcc/omp-general.cc | 4 +- include/cuda/cuda.h | 7 ++ libgomp/config/gcn/target.c | 14 libgomp/config/nvptx/target.c| 14 libgomp/fortran.c| 15 libgomp/libgomp-plugin.h | 1 + libgomp/libgomp.h| 2 + libgomp/libgomp.map | 8 +++ libgomp/libgomp.texi | 89 ++-- libgomp/omp.h.in | 3 + libgomp/omp_lib.f90.in | 23 ++ libgomp/omp_lib.h.in | 23 ++ libgomp/plugin/cuda-lib.def | 2 + libgomp/plugin/plugin-gcn.c | 16 + libgomp/plugin/plugin-nvptx.c| 34 + libgomp/target.c | 56 +++ libgomp/testsuite/libgomp.c/device_uid.c | 38 ++ libgomp/testsuite/libgomp.fortran/device_uid.f90 | 42 +++ 18 files changed, 384 insertions(+), 7 deletions(-) diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index de91ba8a4a7..12788ad0249 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -3260,6 +3260,7 @@ omp_runtime_api_procname (const char *name) "alloc", "calloc", "free", + "get_device_from_uid", "get_interop_int", "get_interop_ptr", "get_mapped_ptr", @@ -3338,12 +3339,13 @
Re: [Patch, Fortran] Implement Unsigned for SUM and PRODUCT
Am 19.09.24 um 11:55 schrieb Andre Vehreschild: Hi Thomas, thanks for the patch. I have one proposal/question and one missing verb (IMO). Else the patch looks fine to me. Ok for trunk. diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index 829ab00c665..e5ffe67 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -2788,7 +2788,7 @@ As of now, the following intrinsics take unsigned arguments: @item @code{MVBITS} @item @code{RANGE} @item @code{TRANSFER} -@item @code{MATMUL} and @code{DOT_PRODUCT} +@item @code{SUM}, @code{PRODUCT}, @code{MATMUL} and @code{DOT_PRODUCT} How about sorting those alphabetically and putting each on a separate line? This might make it more viewable. Just a suggestion. I tried to group them somewhat logically, but you're right, this may be better. Eventually, I want to document the UNSIGNED arguments to all intrinsics so they are in the right place. I think I will re-sort this after all intrinsics have been finished. @end itemize This list will grow in the near future. @c - diff --git a/gcc/fortran/iresolve.cc b/gcc/fortran/iresolve.cc index 32b31432e58..92a591cf6d7 100644 --- a/gcc/fortran/iresolve.cc +++ b/gcc/fortran/iresolve.cc @@ -175,9 +175,11 @@ resolve_bound (gfc_expr *f, gfc_expr *array, gfc_expr *dim, gfc_expr *kind, static void resolve_transformational (const char *name, gfc_expr *f, gfc_expr *array, - gfc_expr *dim, gfc_expr *mask) + gfc_expr *dim, gfc_expr *mask, + bool use_integer = false) { const char *prefix; + bt type; f->ts = array->ts; @@ -200,9 +202,18 @@ resolve_transformational (const char *name, gfc_expr *f, gfc_expr *array, gfc_resolve_dim_arg (dim); } + /* For those intrinsic like SUM where we the integer version There is a verb missing here, IMO. ... where we _use_ the ... ??? This sentence no verb, correct :-) + actually uses unsigned, but we call it as the integer + version. */ + + if (use_integer && array->ts.type == BT_UNSIGNED) +type = BT_INTEGER; + else +type = array->ts.type; + f->value.function.name = gfc_get_string (PREFIX ("%s%s_%c%d"), prefix, name, - gfc_type_letter (array->ts.type), + gfc_type_letter (type), gfc_type_abi_kind (&array->ts)); } Regards and thanks for the patch, Andre Thanks! Best regards Thomas
[PATCH] tree-optimization/116768 - wrong dependence analysis
The following reverts a bogus fix done for PR101009 and instead makes sure we get into the same_access_functions () case when computing the distance vector for g[1] and g[1] where the constants ended up having different types. The generic code doesn't seem to handle loop invariant dependences. The special case gets us both ( 0 ) and ( 1 ) as distance vectors while formerly we got ( 1 ), which the PR101009 fix changed to ( 0 ) with bad effects on other cases as shown in this PR. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk sofar. Richard. PR tree-optimization/116768 * tree-data-ref.cc (build_classic_dist_vector_1): Revert PR101009 change. * tree-chrec.cc (eq_evolutions_p): Make sure (sizetype)1 and (int)1 compare equal. * gcc.dg/torture/pr116768.c: New testcase. --- gcc/testsuite/gcc.dg/torture/pr116768.c | 32 + gcc/tree-chrec.cc | 4 ++-- gcc/tree-data-ref.cc| 2 -- 3 files changed, 34 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr116768.c diff --git a/gcc/testsuite/gcc.dg/torture/pr116768.c b/gcc/testsuite/gcc.dg/torture/pr116768.c new file mode 100644 index 000..57b5d00e7b7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr116768.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ + +#define numwords 2 + +typedef struct { + unsigned words[numwords]; +} Child; + +typedef struct { + Child child; +} Parent; + +Parent my_or(Parent x, const Parent *y) { + const Child *y_child = &y->child; + for (int i = 0; i < numwords; i++) { +x.child.words[i] |= y_child->words[i]; + } + return x; +} + +int main() { + Parent bs[4]; + __builtin_memset(bs, 0, sizeof(bs)); + + bs[0].child.words[0] = 1; + for (int i = 1; i <= 3; i++) { +bs[i] = my_or(bs[i], &bs[i - 1]); + } + if (bs[2].child.words[0] != 1) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-chrec.cc b/gcc/tree-chrec.cc index 8b7982a2dbe..9b272074a2e 100644 --- a/gcc/tree-chrec.cc +++ b/gcc/tree-chrec.cc @@ -1716,7 +1716,7 @@ eq_evolutions_p (const_tree chrec0, const_tree chrec1) || TREE_CODE (chrec0) != TREE_CODE (chrec1)) return false; - if (chrec0 == chrec1) + if (operand_equal_p (chrec0, chrec1, 0)) return true; if (! types_compatible_p (TREE_TYPE (chrec0), TREE_TYPE (chrec1))) @@ -1743,7 +1743,7 @@ eq_evolutions_p (const_tree chrec0, const_tree chrec1) TREE_OPERAND (chrec1, 0)); default: - return operand_equal_p (chrec0, chrec1, 0); + return false; } } diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc index 0f173e8803a..de234c65e94 100644 --- a/gcc/tree-data-ref.cc +++ b/gcc/tree-data-ref.cc @@ -5223,8 +5223,6 @@ build_classic_dist_vector_1 (struct data_dependence_relation *ddr, non_affine_dependence_relation (ddr); return false; } - else - *init_b = true; } return true; -- 2.43.0
Re: [PATCH RFC] build: update bootstrap req to C++14
On Thu, 2024-09-19 at 10:21 -0400, Jason Merrill wrote: /* snip */ > Another possibility would be to just say GCC 5, and adjust that upward > if we run into problems. I'd remind that GCC 5.1 is known as incapable to bootstrap recent GCC releases due to PR 65801. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins
On Thu, 19 Sept 2024 at 14:12, wrote: > > From: Matthew Malcomson > > Hello, this is an RFC for adding an atomic floating point fetch_add builtin > (and variants) to GCC. The atomic fetch_add operation is defined to work > on the base floating point types in the C++20 standard chapter 31.7.3, and > extended to work for all cv-unqualified floating point types in C++23 > chapter 33.5.7.4. > > Honestly not sure who to Cc, please do point me to someone else if that's > better. > > This is nowhere near complete (for one thing even the tests I've added > don't fully pass), but I think I have a complete enough idea that it's > worth checking if this is something that could be agreed on. > > As it stands no target except the nvptx backend would natively support > these operations. > > Main questions that I'm looking to resolve with this RFC: > 1) Would GCC be OK accepting this implementation even though no backend >would be implementing these yet? >- AIUI only the nvptx backend could theoretically implement this. >- Even without a backend implementing it natively, the ability to use > this in code (especially libstdc++) enables other compilers to > generate better code for GPU's using standard C++. > 2) Would libstdc++ be OK relying on `__has_builtin(__atomic_fetch_add_fp)` >(i.e. a check on the resolved builtin rather than the more user-facing >one) in order to determine whether floating point atomic fetch_add is >available. Yes, if that name is what other compilers will also use (have you discussed this with Clang?) It looks like PATCH 5/8 only uses the _fp name for fetch_add though, and just uses fetch_sub etc. for the other functions, is that a mistake? >- N.b. this builtin is actually the builtin working on the "double" OK, so the library code just calls the generic __atomic_fetch_add that accepts any types, but then that gets expanded to a more specific form for float, double etc.? And the more specific form has to exist at some level, because we need an extern symbol from libatomic, so either we include the type as an explicit suffix on the name, or we use some kind of name mangling like _Z18__atomic_fetch_addPdS_S_, which is obviously nasty. > type, one would have to rely on any compilers implementing that > particular resolved builtin to also implement the other floating point > atomic fetch_add builtins that they would want to support in libstdc++ > `atomic<[floating_point_type]>::fetch_add`. This seems a bit concerning. I can imagine somebody implementing these for float and double first, but leaving long double, _Float64, _Float32, _Float128 etc. for later. In that case, libstdc++ would not work if somebody tries to use std::atomic, or whichever types aren't supported yet. It's OK if we can be *sure* that won't happen i.e. that Clang will either implement the new built-in for *all* FP types, or none. > > More specific questions about the choice of which builtins to implement and > whether the types are OK: > 1) Is it OK to not implement the `__sync_*` versions? >Since these are deprecated and the `__atomic_*` versions are there to >match the C/C++ code atomic operations (which is a large part of the >reason for the new floating point operations). > 2) Is it OK to not implement all the `fetch_*` operations? >None of the bitwise operations are specified for C++ and bitwise >operations are (AIUI) rarely used on floating point values. That seems OK (entirely correct even) to me. > 3) Wanting to be able to farm out to libatomic meant that we need constant > names >for the specialised functions. >- This led to the naming convention based on floating point type. >- That naming convention *could* be updated to include the special backend > floating point types if needed. I have not done this mostly because I > thought it would not add much, though I have not looked into this very > closely. > 4) Wanting to name the functions based on floating point type rather than size >meant that the mapping from type passed to the overloaded version to >specialised builtin was less direct than for the integral versions. >- Ended up with a hard-coded table in the source to check this. >- Would very much like some better approach, not certain what better > approach > I could find. >- Will eventually at least share the hard-coded tables (which isn't > happening yet because this is at RFC level). > 5) Are there any other types that I should use? >Similarly are there any types that I'm trying to use that I shouldn't? >I *believe* I've implemented all the types that make sense and are >general builtin types. Could easily have missed some (e.g. left >`_Float128x` alone because AIUI it's larger than 128bits which means we >don't have any other atomic operations on such data), could also easily That seems like a problem though - it means that GCC could be in exac
Re: [PATCH RFC] build: update bootstrap req to C++14
On Thu, Sep 19, 2024 at 10:21:15AM -0400, Jason Merrill wrote: > On 9/19/24 7:57 AM, Richard Biener wrote: > > On Wed, Sep 18, 2024 at 6:22 PM Jason Merrill wrote: > > > > > > Tested x86_64-pc-linux-gnu with 5.5.0 bootstrap compiler. Thoughts? > > > > I'm fine with this in general - do we have needs of bumping the requirement > > for > > GCC 15 though? IMO we should bump once we are requiring actual C++14 > > in some place. > > Jakub's dwarf2asm patch yesterday uses C++14 if available, and I remember And libcpp too. > seeing a couple of other patches that would have been simpler with C++14 > available. It was just a few lines and if I removed the now never true HAVE_DESIGNATED_INITIALIZERS cases, it wouldn't even add any new lines, just change some to others. Both of those patches were just minor optimizations, it is fine if they don't happen during stage1. We also have some spots with #if __cpp_inline_variables < 201606L #else #endif conditionals but that doesn't mean we need to bump to C++17. Sure, bumping the required C++ version means we can remove the corresponding conditionals, and more importantly stop worrying about working around GCC 4.8.x/4.9 bugs (I think that is actually more important). The price is stopping to use some of the cfarm machines for testing or using IBM Advanced Toolchain or hand-built GCC 14 there as the system compiler there. At some point we certainly want to do that, the question is if the benefits right now overweight the pain. > > As of the version requirement as you say only some minor versions of the > > GCC 5 > > series are OK I would suggest to say we recommend using GCC 6 or later > > but GCC 5.5 should also work? > > Aren't we already specifying a minor revision with 4.8.3 for C++11? > > Another possibility would be to just say GCC 5, and adjust that upward if we > run into problems. I think for the oldest supported version we need some CFarm machines around with that compiler so that all people can actually test issues with it. Dunno which distros shipped GCC 5 in long term support versions if any and at which minor those are. Jakub
[PATCH v5] c++: deleting explicitly-defaulted functions [PR116162]
On Tue, Sep 17, 2024 at 12:50:46PM -0400, Jason Merrill wrote: > On 9/16/24 7:14 PM, Marek Polacek wrote: > > +/* Mark an explicitly defaulted function FN as =deleted and warn. > > + IMPLICIT_FN is the corresponding special member function that > > + would have been implicitly declared. */ > > + > > +void > > +maybe_delete_defaulted_fn (tree fn, tree implicit_fn) > > +{ > > + if (DECL_ARTIFICIAL (fn) || !DECL_DEFAULTED_IN_CLASS_P (fn)) > > +return; > > + > > + DECL_DELETED_FN (fn) = true; > > + > > + if (!warn_defaulted_fn_deleted) > > +return; > > The flag shouldn't affect the error cases; I'd drop this check. Dropped. > > + auto_diagnostic_group d; > > + const special_function_kind kind = special_function_p (fn); > > + tree parmtype > > += TREE_VALUE (DECL_XOBJ_MEMBER_FUNCTION_P (fn) > > + ? TREE_CHAIN (TYPE_ARG_TYPES (TREE_TYPE (fn))) > > + : FUNCTION_FIRST_USER_PARMTYPE (fn)); > > + const bool illformed_p > > +/* [dcl.fct.def.default] "if F1 is an assignment operator"... */ > > += (SFK_ASSIGN_P (kind) > > + /* "and the return type of F1 differs from the return type of F2" > > */ > > + && (!same_type_p (TREE_TYPE (TREE_TYPE (fn)), > > +TREE_TYPE (TREE_TYPE (implicit_fn))) > > + /* "or F1's non-object parameter type is not a reference, > > + the program is ill-formed" */ > > + || !TYPE_REF_P (parmtype))); > > + /* Decide if we want to emit a pedwarn, error, or a warning. */ > > + diagnostic_t diag_kind; > > + if (cxx_dialect >= cxx20) > > +diag_kind = illformed_p ? DK_ERROR : DK_WARNING; > > + else > > +diag_kind = DK_PEDWARN; > > Error should be errors in all standard modes; it doesn't make sense to have > a softer diagnostic in an older mode when it's ill-formed in all. > > Non-errors should be warnings or pedwarns depending on the standard mode. Aaah, I misunderstood. Hopefully I got it right this time. > > + /* Don't warn for template instantiations. */ > > + if (DECL_TEMPLATE_INSTANTIATION (fn) && diag_kind == DK_WARNING) > > +return; > > + > > + const char *wmsg; > > + switch (kind) > > +{ > > +case sfk_copy_constructor: > > + wmsg = G_("explicitly defaulted copy constructor is implicitly > > deleted " > > + "because its declared type does not match the type of an " > > + "implicit copy constructor"); > > + break; > > +case sfk_move_constructor: > > + wmsg = G_("explicitly defaulted move constructor is implicitly > > deleted " > > + "because its declared type does not match the type of an " > > + "implicit move constructor"); > > + break; > > +case sfk_copy_assignment: > > + wmsg = G_("explicitly defaulted copy assignment operator is > > implicitly " > > + "deleted because its declared type does not match the type " > > + "of an implicit copy assignment operator"); > > + break; > > +case sfk_move_assignment: > > + wmsg = G_("explicitly defaulted move assignment operator is > > implicitly " > > + "deleted because its declared type does not match the type " > > + "of an implicit move assignment operator"); > > + break; > > +default: > > + gcc_unreachable (); > > +} > > + if (emit_diagnostic (diag_kind, DECL_SOURCE_LOCATION (fn), > > + OPT_Wdefaulted_function_deleted, wmsg)) > > Let's not pass the OPT when DK_ERROR. Done. I've added new tests to cover -Wno-defaulted-function-deleted. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- This PR points out the we're not implementing [dcl.fct.def.default] properly. Consider e.g. struct C { C(const C&&) = default; }; where we wrongly emit an error, but the move ctor should be just =deleted. According to [dcl.fct.def.default], if the type of the special member function differs from the type of the corresponding special member function that would have been implicitly declared in a way other than as allowed by 2.1-4, the function is defined as deleted. There's an exception for assignment operators in which case the program is ill-formed. clang++ has a warning for when we delete an explicitly-defaulted function so this patch adds it too. When the code is ill-formed, we emit an error in all modes. Otherwise, we emit a pedwarn in C++17 and a warning in C++20. PR c++/116162 gcc/c-family/ChangeLog: * c.opt (Wdefaulted-function-deleted): New. gcc/cp/ChangeLog: * class.cc (check_bases_and_members): Don't set DECL_DELETED_FN here, leave it to defaulted_late_check. * cp-tree.h (maybe_delete_defaulted_fn): Declare. (defaulted_late_check): Add a tristate parameter. * method.cc (maybe_delete_defaulted_fn): New. (defaulted_late_check): Add a tristate parameter. Call maybe_delete_defaulted_fn instead of giving an error. gcc/ChangeLog: *
Re: [PATCH 5/5] arm: [MVE intrinsics] Rework MVE vld/vst intrinsics
Hi! I had not noticed that this patch makes gcc.target/arm/pr112337.c fail because __builtin_mve_vldrwq_sv4si is no longer available. Adding this fixes the problem: diff --git a/gcc/testsuite/gcc.target/arm/pr112337.c b/gcc/testsuite/gcc.target/arm/pr112337.c index 10b7881b9f9..599229c1db0 100644 --- a/gcc/testsuite/gcc.target/arm/pr112337.c +++ b/gcc/testsuite/gcc.target/arm/pr112337.c @@ -4,7 +4,9 @@ /* { dg-add-options arm_v8_1m_mve } */ #pragma GCC arm "arm_mve_types.h" -int32x4_t h(void *p) { return __builtin_mve_vldrwq_sv4si(p); } +#pragma GCC arm "arm_mve.h" false + +int32x4_t h(void *p) { return vldrwq_s32(p); } void g(int32x4_t); void f(int, int, int, short, int *p) { int *bias = p; I hope that's simple enough not to need a v2 of the patch series if everything else is OK? Thanks, Christophe On 9/16/24 11:38, Christophe Lyon wrote: From: Alfie Richards Implement the mve vld and vst intrinsics using the MVE builtins framework. The main part of the patch is to reimplement to vstr/vldr patterns such that we now have much fewer of them: - non-truncating stores - predicated non-truncating stores - truncating stores - predicated truncating stores - non-extending loads - predicated non-extending loads - extending loads - predicated extending loads This enables us to update the implementation of vld1/vst1 and use the new vldr/vstr builtins. The patch also adds support for the predicated vld1/vst1 versions. 2024-09-11 Alfie Richards Christophe Lyon gcc/ * config/arm/arm-mve-builtins-base.cc (vld1q_impl): Add support for predicated version. (vst1q_impl): Likewise. (vstrq_impl): New class. (vldrq_impl): New class. (vldrbq): New. (vldrhq): New. (vldrwq): New. (vstrbq): New. (vstrhq): New. (vstrwq): New. * config/arm/arm-mve-builtins-base.def (vld1q): Add predicated version. (vldrbq): New. (vldrhq): New. (vldrwq): New. (vst1q): Add predicated version. (vstrbq): New. (vstrhq): New. (vstrwq): New. (vrev32q): Update types to float_16. * config/arm/arm-mve-builtins-base.h (vldrbq): New. (vldrhq): New. (vldrwq): New. (vstrbq): New. (vstrhq): New. (vstrwq): New. * config/arm/arm-mve-builtins-functions.h (memory_vector_mode): Remove conversion of floating point vectors to integer. * config/arm/arm-mve-builtins.cc (TYPES_float16): Change to... (TYPES_float_16): ...this. (TYPES_float_32): New. (float16): Change to... (float_16): ...this. (float_32): New. (preds_z_or_none): New. (function_resolver::check_gp_argument): Add support for _z predicate. * config/arm/arm_mve.h (vstrbq): Remove. (vstrbq_p): Likewise. (vstrhq): Likewise. (vstrhq_p): Likewise. (vstrwq): Likewise. (vstrwq_p): Likewise. (vst1q_p): Likewise. (vld1q_z): Likewise. (vldrbq_s8): Likewise. (vldrbq_u8): Likewise. (vldrbq_s16): Likewise. (vldrbq_u16): Likewise. (vldrbq_s32): Likewise. (vldrbq_u32): Likewise. (vstrbq_p_s8): Likewise. (vstrbq_p_s32): Likewise. (vstrbq_p_s16): Likewise. (vstrbq_p_u8): Likewise. (vstrbq_p_u32): Likewise. (vstrbq_p_u16): Likewise. (vldrbq_z_s16): Likewise. (vldrbq_z_u8): Likewise. (vldrbq_z_s8): Likewise. (vldrbq_z_s32): Likewise. (vldrbq_z_u16): Likewise. (vldrbq_z_u32): Likewise. (vldrhq_s32): Likewise. (vldrhq_s16): Likewise. (vldrhq_u32): Likewise. (vldrhq_u16): Likewise. (vldrhq_z_s32): Likewise. (vldrhq_z_s16): Likewise. (vldrhq_z_u32): Likewise. (vldrhq_z_u16): Likewise. (vldrwq_s32): Likewise. (vldrwq_u32): Likewise. (vldrwq_z_s32): Likewise. (vldrwq_z_u32): Likewise. (vldrhq_f16): Likewise. (vldrhq_z_f16): Likewise. (vldrwq_f32): Likewise. (vldrwq_z_f32): Likewise. (vstrhq_f16): Likewise. (vstrhq_s32): Likewise. (vstrhq_s16): Likewise. (vstrhq_u32): Likewise. (vstrhq_u16): Likewise. (vstrhq_p_f16): Likewise. (vstrhq_p_s32): Likewise. (vstrhq_p_s16): Likewise. (vstrhq_p_u32): Likewise. (vstrhq_p_u16): Likewise. (vstrwq_f32): Likewise. (vstrwq_s32): Likewise. (vstrwq_u32): Likewise. (vstrwq_p_f32): Likewise. (vstrwq_p_s32): Likewise. (vstrwq_p_u32): Likewise. (vst1q_p_u8): Likewise. (vst1q_p_s8): Likewise. (vld1q_z_u8): Likewise. (vld1q_z_s8): Likewise. (vst1q_p_u16): Likewise. (vst1q_p_s16): Likewise. (vld1q_z_u16): Likewise. (vld1q_z_s16): Likewise
Re: [PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues
On Thu, 19 Sep 2024, Richard Sandiford wrote: Martin Storsjö writes: On Thu, 12 Sep 2024, Evgeny Karpov wrote: The current binutils implementation does not support offset up to 4GB in IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB. This is related to differences in ELF and COFF relocation records. Yes, I agree. But I would not consider this a limitation of the binutils implementation, this is a limitation of the object file format. It can't be worked around by inventing your own custom relocations, but should instead worked around on the code generation side, to avoid needing such large offsets. This approach is one such, quite valid. Another one is to generate extra symbols to allow addressing anything with a smaller offset. Maybe this is my ELF bias showing, but: generating extra X=Y+OFF symbols isn't generally valid for ELF when Y is a global symbol, since interposition rules, comdat, weak symbols, and various other reasons, could mean that the local definition of Y isn't the one that gets used. Does COFF cope with that in some other way? If not, I would have expected that there would need to be a fallback path that didn't involve defining extra symbols. That's indeed a fair point. COFF doesn't cope with that in other ways - so defining such extra symbols to cope for the offsets, for global symbols that can be interposed or swapped out at linking stage, would indeed be wrong. In practice, I think it's rare to reference such an interposable symbol with an offset overall - even more so to reference it with an offset over 1 MB. The practical cases where one mostly runs into the limitation, is when you have large sections, and use temporary labels to reference positions within those sections. As the temporary labels don't persist into the object file, the references against temporary labels end up as against section base, plus an offset. And those symbols (the section base) aren't global. The workaround I did for this within LLVM, https://github.com/llvm/llvm-project/commit/06d0d449d8555ae5f1ac33e8d4bb4ae40eb080d3, deals specifically only with temporary symbols. // Martin
Re: [patch, fortran] Matmul and dot_product for unsigned
Hi Thomas, unfortunately I have some questions. Most of them are for my understanding. > diff --git a/gcc/fortran/arith.cc b/gcc/fortran/arith.cc > index 66a3635404a..a214b8bc1b3 100644 > --- a/gcc/fortran/arith.cc > +++ b/gcc/fortran/arith.cc > @@ -711,17 +711,9 @@ gfc_arith_uminus (gfc_expr *op1, gfc_expr **resultp) > case BT_UNSIGNED: >{ > if (pedantic) > - return ARITH_UNSIGNED_NEGATIVE; > + return check_result (ARITH_UNSIGNED_NEGATIVE, op1, result, resultp); What is the need for this check? ARITH_UNSIGNED_NEGATIVE is, when I read the code correctly, never triggering an error here. What do I not see? > > - arith neg_rc; > mpz_neg (result->value.integer, op1->value.integer); > - neg_rc = gfc_range_check (result); > diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc > index cfafdb7974f..7c630dd73f4 100644 > --- a/gcc/fortran/check.cc > +++ b/gcc/fortran/check.cc > @@ -2804,6 +2804,10 @@ gfc_check_dot_product (gfc_expr *vector_a, gfc_expr > *vector_b) return false; >break; > > +case BT_UNSIGNED: > + /* Check comes later. */ > + break; > + > default: >gfc_error ("%qs argument of %qs intrinsic at %L must be numeric " >"or LOGICAL", gfc_current_intrinsic_arg[0]->name, > @@ -2811,6 +2815,14 @@ gfc_check_dot_product (gfc_expr *vector_a, gfc_expr > *vector_b) return false; > } > > + if (gfc_invalid_unsigned_ops (vector_a, vector_b)) I haven't read the proposal (shame to me), but why would want not want to combine a unsigned vector with a signed one? This all depends on the data type of the result (variable). So why is this needed here? (I know we don't have the result available here.) It just feels odd to me. > +{ > + gfc_error ("Argument types of %qs intrinsic at %L must match (%s/%s)", > + gfc_current_intrinsic, &vector_a->where, > + gfc_typename(&vector_a->ts), gfc_typename(&vector_b->ts)); > + return false; > +} > + >if (!rank_check (vector_a, 0, 1)) > return false; > > @@ -4092,7 +4104,8 @@ gfc_check_matmul (gfc_expr *matrix_a, gfc_expr > *matrix_b) } > >if ((matrix_a->ts.type == BT_LOGICAL && gfc_numeric_ts (&matrix_b->ts)) > - || (gfc_numeric_ts (&matrix_a->ts) && matrix_b->ts.type == BT_LOGICAL)) > + || (gfc_numeric_ts (&matrix_a->ts) && matrix_b->ts.type == BT_LOGICAL) > + || gfc_invalid_unsigned_ops (matrix_a, matrix_b)) Same here. > { >gfc_error ("Argument types of %qs intrinsic at %L must match (%s/%s)", >gfc_current_intrinsic, &matrix_a->where, > diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc > index 81c641e2322..cef971894ea 100644 > --- a/gcc/fortran/expr.cc > +++ b/gcc/fortran/expr.cc > @@ -224,7 +224,19 @@ gfc_get_int_expr (int kind, locus *where, HOST_WIDE_INT > value) return p; > } > > +/* Get a new expression node that is an unsigned constant. */ > > +gfc_expr * > +gfc_get_unsigned_expr (int kind, locus *where, HOST_WIDE_INT value) > +{ > + gfc_expr *p; > + p = gfc_get_constant_expr (BT_UNSIGNED, kind, > + where ? where : &gfc_current_locus); > + const wide_int w = wi::shwi (value, kind * BITS_PER_UNIT); > + wi::to_mpz (w, p->value.integer, UNSIGNED); > + > + return p; > +} Newline please :-) > /* Get a new expression node that is a logical constant. */ > > gfc_expr * > diff --git a/libgfortran/m4/iparm.m4 b/libgfortran/m4/iparm.m4 > index b474620424b..0c4c76c2428 100644 > --- a/libgfortran/m4/iparm.m4 > +++ b/libgfortran/m4/iparm.m4 > @@ -4,7 +4,7 @@ dnl This file is part of the GNU Fortran 95 Runtime Library > (libgfortran) dnl Distributed under the GNU GPL with exception. See COPYING > for details. dnl M4 macro file to get type names from filenames > define(get_typename2, `GFC_$1_$2')dnl > -define(get_typename, > `get_typename2(ifelse($1,i,INTEGER,ifelse($1,r,REAL,ifelse($1,l,LOGICAL,ifelse($1,c,COMPLEX,ifelse($1,s,UINTEGER,unknown),`$2')')dnl > +define(get_typename, > `get_typename2(ifelse($1,i,INTEGER,ifelse($1,r,REAL,ifelse($1,l,LOGICAL,ifelse($1,c,COMPLEX,ifelse($1,m,UINTEGER,ifelse($1,s,UINTEGER,unknown)),`$2')')dnl Curiosity killed the cat: So type letter 's' and 'm' both signify a unsigned integer, right? Is there anywhere a notable difference? I meant, keep it simple is usually wanted and having to type letters with identical meaning is not simple, right? > define(get_arraytype, `gfc_array_$1$2')dnl define(define_type, `dnl > ifelse(regexp($2,`^[0-9]'),-1,`dnl diff --git a/libgfortran/m4/matmul.m4 > b/libgfortran/m4/matmul.m4 index 7fc1f5fa75f..cd804e8be06 100644 > --- a/libgfortran/m4/matmul.m4 > +++ b/libgfortran/m4/matmul.m4 > @@ -28,6 +28,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively. > If not, see #include ' > > include(iparm.m4)dnl > +ifelse(index(rtype_name,`GFC_INTEGER'),`0',dnl > +define(`rtype_name',patsubst(rtype_name,`GFC_INTEGER',`GFC_UINTEGER'))dnl > +define(`rtype',patsubst(
Re: [PATCH] libcpp: Add -Wtrailing-blanks warning
On Thu, Sep 19, 2024 at 08:17:24AM +0200, Richard Biener wrote: > On Wed, Sep 18, 2024 at 7:33 PM Jakub Jelinek wrote: > > > > On Wed, Sep 18, 2024 at 06:17:58PM +0100, Richard Sandiford wrote: > > > +1 I'd much rather learn about this kind of error before the code reaches > > > a review tool :) > > > > > > >From a quick check, it doesn't look like Clang has this, so there is no > > > existing name to follow. > > > > I was considering also -Wtrailing-whitespace, but > > 1) git diff really warns just about trailing spaces/tabs, not form feeds or > > vertical tabs > > 2) gcc source contains tons of spots with form feed in it (though, > > I think pretty much always as the sole character on a line). > > And not really sure how people use vertical tabs in the source if at all. > > Perhaps form feed could be not warned if at end of line if it isn't the sole > > character on a line... > > Generally I like diagnosing this early. For the above I'd say > -Wtrailing-whitespace= > with a set of things to diagnose (and a sane default - just spaces and > tabs - for > -Wtrailiing-whitespace) would be nice. As for naming possibly follow the > is{space,blank,cntrl} character classifications? If those are a good > fit, that is. I think the character classifications risk problems. space is ' ' '\t' '\n' '\r' '\f' '\v' in the C locale, blank is ' ' '\t' cntrl is a lot of chars but not ' ' if we extend by the safe-ctype vspace '\r' '\n' nvspace ' ' '\t' '\f' '\v' '\0' Obviously, we shouldn't look at '\r' and '\n', those aren't trailing characters, those are line separators. Would we need to consider all UTF-8 (or EBCDIC-UTF) control characters is cntrl? ..0009; Control # Cc [10] .. 000B..000C; Control # Cc [2] .. 000E..001F; Control # Cc [18] .. 007F..009F; Control # Cc [33] .. 00AD ; Control # Cf SOFT HYPHEN 061C ; Control # Cf ARABIC LETTER MARK 180E ; Control # Cf MONGOLIAN VOWEL SEPARATOR 200B ; Control # Cf ZERO WIDTH SPACE 200E..200F; Control # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK 2028 ; Control # Zl LINE SEPARATOR 2029 ; Control # Zp PARAGRAPH SEPARATOR 202A..202E; Control # Cf [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE 2060..2064; Control # Cf [5] WORD JOINER..INVISIBLE PLUS 2065 ; Control # Cn 2066..206F; Control # Cf [10] LEFT-TO-RIGHT ISOLATE..NOMINAL DIGIT SHAPES FEFF ; Control # Cf ZERO WIDTH NO-BREAK SPACE FFF0..FFF8; Control # Cn [9] .. FFF9..FFFB; Control # Cf [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR 13430..1343F ; Control # Cf [16] EGYPTIAN HIEROGLYPH VERTICAL JOINER..EGYPTIAN HIEROGLYPH END WALLED ENCLOSURE 1BCA0..1BCA3 ; Control # Cf [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND FORMAT UP STEP 1D173..1D17A ; Control # Cf [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE E ; Control # Cn E0001 ; Control # Cf LANGUAGE TAG E0002..E001F ; Control # Cn [30] .. E0080..E00FF ; Control # Cn [128] .. E01F0..E0FFF ; Control # Cn [3600] .. Wonder why anybody would be interested to find just trailing spaces and not trailing tabs or vice versa, so if we have categories, blank would be one, then perhaps nvspace as something not including '\0', so just ' ' '\t' '\f' '\v' and if really needed, control characters with added ' ', but how to call that and would it really need to parse UTF-8/EBCDIC and look at pregenerated tables? Jakub
Re: [PATCH v5 4/4] RISC-V: Fix vector SAT_ADD dump check due to middle-end change
> This patch would like fix the dump check times of vector SAT_ADD. The > middle-end change makes the match times from 2 to 4 times. > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. That's OK. And I think testsuite fixup patches like this you can consider "obvious" as long as you're sure the underlying reason is understood. In particular as you have been working in the saturating space for a while now. So for the future I'd suggest you post those with a remark that you think they're obvious and going to commit in a day (or some other reasonable timeframe) if there are no complaints. -- Regards Robin
Re: [PATCH][v2] tree-optimization/116573 - .SELECT_VL for SLP
> On Tue, 17 Sep 2024, Richard Biener wrote: > > > The following restores the use of .SELECT_VL for testcases where it > > is safe to use even when using SLP. I've for now restricted it > > to single-lane SLP plus optimistically allow store-lane nodes > > and assume single-lane roots are not widened but at most to > > load-lane who should be fine. > > > > v2 fixes latent issues in vectorizable_load/store. > > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > So while this fixes the earlier observed 80 regressions from not using > SLP this now introduces many more from the CI (800), all in other > scan-assembler tests where after checking a sample of one (sic!) > we seem to use .SELECT_VL more now but expect not to. Unfortunately > none of the affected tests are runtime tests but at least for the > single test I investigated there is nothing wrong with using .SELECT_VL. > > I've checked the full CI results and as far I can see there are no > execute fails caused by this patch (I have locally done a full > check-gcc as well with a similar result). > > So I'm asking for explicit approval here. > > OK for trunk? Odd. With my testing, rv64 only though, I haven't observed any additional fallout. But the CI knows better, usually. While I worked on my patch (which ended up looking similar to yours) I also noticed that some examples now use SELECT_VL where we didn't before, and, they appeared reasonable to me. Definitely saw no execution failures either. So, I'd say let's go ahead. Once it is in we can deal with the fallout. Same as the LOAD_LANES fallout that I wanted to take care of as soon as our internal matters permit. Thanks for fixing it. -- Regards Robin
Re: [PATCH] RISC-V: Align vconfig for TARGER_SFB_ALU
Hi Dusan, sorry for the late reply. > This patch addresses a missed opportunity to fuse vsetvl_infos. > Instead of checking whether demands for merging configurations of > vsetvl_info are all met, the demands are checked individually. > > The case in question occurs because of the conditional move > instruction which sifive-7, sifive-p400 and sifive-p600 support. > Firstly, the conditional move generated rearranges the CFG. > Secondly, because the conditional move generated uses > the same register in the if_then_else pattern as vsetvli before it > curr_info and prev_info won't be merged. Can you elaborate a bit on that? Rearranging the CFG shouldn't matter in general and relying on the specific TARGET_SFB_ALU feels overly specific. Why does the same register in the if_then_else and interfere with vsetvl? BTW Bohan Lei has since fixed a bug regarding non-RVV uses. Does the situation change with that applied? -- Regards Robin
Re: [PATCH v2 0/9] SMALL code model fixes, optimization fixes, LTO and minimal C++ enablement
Evgeny Karpov writes: > Hello, > > Thank you for reviewing v1! > > v2 Changes: > - Add extra comments and extend patch descriptions. > - Extract libstdc++ changes to a separate patch. > - Minor style refactoring based on the reviews. > - Unify mingw_pe_declare_type for functions and objects. Thanks for the update. Aside from the points raised in the discussion about patches 5, 6, and 9 (and taking into account what you said about patch 7), the series looks good. Thanks, Richard > > Regards, > Evgeny > > Evgeny Karpov (9): > Support weak references > aarch64: Add debugging information > aarch64: Add minimal C++ support > aarch64: Exclude symbols using GOT from code models > aarch64: Multiple adjustments to support the SMALL code model > correctly > aarch64: Use symbols without offset to prevent relocation issues > aarch64: Disable the anchors > Add LTO support > aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT > > gcc/config.gcc| 1 + > gcc/config/aarch64/aarch64-coff.h | 32 +++--- > gcc/config/aarch64/aarch64.cc | 43 --- > gcc/config/aarch64/cygming.h | 69 +-- > gcc/config/i386/cygming.h | 16 +++ > gcc/config/i386/i386-protos.h | 2 - > gcc/config/mingw/winnt-dll.cc | 4 +- > gcc/config/mingw/winnt.cc | 33 ++- > gcc/config/mingw/winnt.h | 7 ++-- > libiberty/simple-object-coff.c| 4 +- > 10 files changed, 158 insertions(+), 53 deletions(-)
Re: [PATCH v2 9/9] aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT
Evgeny Karpov writes: > In some cases, the alignment can be bigger than BIGGEST_ALIGNMENT. > > The issue was detected while building FFmpeg. > It creates structures, most likely for AVX optimization. > > For instance: > float __attribute__((aligned (32))) large_aligned_array[3]; > > BIGGEST_ALIGNMENT could be up to 512 bits on x64. > This patch has been added to cover this case without needing to > change the FFmpeg code. What goes wrong if we don't do this? I'm not sure from the description whether it's a correctness fix, a performance fix, or whether it's about avoiding wasted space. > gcc/ChangeLog: > > * config/aarch64/aarch64-coff.h (ASM_OUTPUT_ALIGNED_LOCAL): > Change alignment. > --- > gcc/config/aarch64/aarch64-coff.h | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/config/aarch64/aarch64-coff.h > b/gcc/config/aarch64/aarch64-coff.h > index 17f346fe540..bf8e30b9c08 100644 > --- a/gcc/config/aarch64/aarch64-coff.h > +++ b/gcc/config/aarch64/aarch64-coff.h > @@ -58,6 +58,16 @@ >assemble_name ((FILE), (NAME)),\ >fprintf ((FILE), ",%lu\n", (ROUNDED))) > > +#define ASM_OUTPUT_ALIGNED_LOCAL(FILE, NAME, SIZE, ALIGNMENT) \ > + { \ > +unsigned HOST_WIDE_INT rounded = MAX ((SIZE), 1); \ > +unsigned HOST_WIDE_INT alignment = MAX ((ALIGNMENT), BIGGEST_ALIGNMENT); > \ > +rounded += (alignment / BITS_PER_UNIT) - 1; \ > +rounded = (rounded / (alignment / BITS_PER_UNIT) \ > + * (alignment / BITS_PER_UNIT)); \ There's a ROUND_UP macro that could be used here. Thanks, Richard > +ASM_OUTPUT_LOCAL (FILE, NAME, SIZE, rounded); \ > + } > + > #define ASM_OUTPUT_SKIP(STREAM, NBYTES) \ >fprintf (STREAM, "\t.space\t%d // skip\n", (int) (NBYTES))
Re: [PATCH v3] match: Fix A || B not optimized to true when !B implies A [PR114326]
I have sent a new version (https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663350.html). I also added :c to the ne operations. Thanks, Konstantinos On Wed, Sep 18, 2024 at 1:52 PM Richard Biener wrote: > > On Wed, Sep 18, 2024 at 10:42 AM Konstantinos Eleftheriou > wrote: > > > > On Mon, Sep 9, 2024 at 3:11 PM Richard Biener > > wrote: > > > > > > On Thu, Aug 29, 2024 at 9:03 AM wrote: > > > > > > > > From: kelefth > > > > > > > > In expressions like (a != b || ((a ^ b) & c) == d) and > > > > (a != b || (a ^ b) == c), (a ^ b) is folded to false. > > > > In the equivalent expressions (((a ^ b) & c) == d || a != b) and > > > > ((a ^ b) == c || a != b) this is not happening. > > > > > > > > This patch adds the following simplifications in match.pd: > > > > ((a ^ b) & c) == d || a != b --> 0 == d || a != b > > > > (a ^ b) == c || a != b --> 0 == c || a != b > > > > > > > > PR tree-optimization/114326 > > > > > > > > gcc/ChangeLog: > > > > > > > > * match.pd: Add two patterns to fold a ^ b to 0, when a == b. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.dg/tree-ssa/fold-xor-and-or.c: New test. > > > > * gcc.dg/tree-ssa/fold-xor-or.c: New test. > > > > > > > > Reviewed-by: Christoph Müllner > > > > Signed-off-by: Philipp Tomsich > > > > Signed-off-by: Konstantinos Eleftheriou > > > > > > > > --- > > > > gcc/match.pd | 30 ++ > > > > .../gcc.dg/tree-ssa/fold-xor-and-or.c | 31 +++ > > > > gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c | 31 +++ > > > > 3 files changed, 92 insertions(+) > > > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c > > > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > index be211535a49..6bab3cfbde1 100644 > > > > --- a/gcc/match.pd > > > > +++ b/gcc/match.pd > > > > @@ -10727,3 +10727,33 @@ and, > > > >} > > > >(if (full_perm_p) > > > > (vec_perm (op@3 @0 @1) @3 @2)) > > > > > >> Can you please place those patterns next to related ones? I suggest > > >> after (type)([0,1]@a != 0) -> (type)a and before > > >> /* We can't reassociate at all for saturating types. */ > > > > Yes, I will fix that. > > > > > > > > > > +/* ((a ^ b) & c) == d || a != b --> (0 == d || a != b). */ > > > > > >> The comment indicates == d but you also test for other > > >> comparison ops. As far as I can see your testcases also > > >> only cover ==. > > > > I will change "==" to "cmp" in the comments and add additional testcases. > > My intention is to cover all operations included in "simple_comparison". > > > > > > > > > > +(for cmp (simple_comparison) > > > > + (simplify > > > > +(bit_ior > > > > + (cmp > > > > + (bit_and > > > > > > This needs :c > > > > > > > + (bit_xor @0 @1) > > > > > > Likewise. > > > > Right, I will fix these cases. > > > > > > > > >> I think you also need :c on the comparison to match > > >> d == (...). > > > > In that case, I would need to handle non-commutative operations (e.g. > > >) separately, right? > > The non-commutative operations can be handled as well, the tree code > will be inverted accordingly. > > Richard. > > > > > > > > > > > + tree_expr_nonzero_p@2) > > > > + @3) > > > > + (ne@4 @0 @1)) > > > > +(bit_ior > > > > + (cmp > > > > + { build_zero_cst (TREE_TYPE (@0)); } > > > > + @3) > > > > + @4))) > > > > + > > > > +/* (a ^ b) == c || a != b --> (0 == c || a != b). */ > > > > +(for cmp (simple_comparison) > > > > + (simplify > > > > +(bit_ior > > > > + (cmp > > > > + (bit_xor @0 @1) > > > > > > similar, :c here and also on the comparison. Same > > > question with regard to == c. > > > > > > > + @2) > > > > + (ne@3 @0 @1)) > > > > +(bit_ior > > > > + (cmp > > > > + { build_zero_cst (TREE_TYPE (@0)); } > > > > + @2) > > > > + @3))) > > > > \ No newline at end of file > > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c > > > > b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c > > > > new file mode 100644 > > > > index 000..ec327e62f6e > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c > > > > @@ -0,0 +1,31 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O3 -fdump-tree-optimized" } */ > > > > + > > > > +int cmp1(int d1, int d2) { > > > > + if (((d1 ^ d2) & 0xabcd) == 0 || d1 != d2) > > > > +return 0; > > > > + return 1; > > > > +} > > > > + > > > > +int cmp2(int d1, int d2) { > > > > + if (d1 != d2 || ((d1 ^ d2) & 0xabcd) == 0) > > > > +return 0; > > > > + return 1; > > > > +} > > > > + > > > > +typedef unsigned long int uint64_t; > > > > + > > > > +int cmp1_64(uint64_t d1, uint64_t d2) { > > > > + if (((d1 ^ d2) & 0xabcd) == 0 || d1 != d2) > > > > +return 0; > > > > + return 1; >
[PATCH v2] aarch64: Add fp8 scalar types
The ACLE defines a new scalar type, __mfp8. This is an opaque 8bit types that can only be used by fp8 intrinsics. Additionally, the mfloat8_t type is made available in arm_neon.h and arm_sve.h as an alias of the same. This implementation uses an unsigned INTEGER_TYPE, with precision 8 to represent __mfp8. Conversions to int and other types are disabled via the TARGET_INVALID_CONVERSION hook. Additionally, operations that are typically available to integer types are disabled via TARGET_INVALID_UNARY_OP and TARGET_INVALID_BINARY_OP hooks. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (aarch64_mfp8_type_node): Add node for __mfp8 type. (aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type. (aarch64_init_fp8_types): New function to initialise fp8 types and register with language backends. * config/aarch64/aarch64.cc (aarch64_mangle_type): Add ABI mangling for new type. (aarch64_invalid_conversion): Add function implementing TARGET_INVALID_CONVERSION hook that blocks conversion to and from the __mfp8 type. (aarch64_invalid_unary_op): Add function implementing TARGET_UNARY_OP hook that blocks operations on __mfp8 other than &. (aarch64_invalid_binary_op): Extend TARGET_BINARY_OP hook to disallow operations on __mfp8 type. (TARGET_INVALID_CONVERSION): Add define. (TARGET_INVALID_UNARY_OP): Likewise. * config/aarch64/aarch64.h (aarch64_mfp8_type_node): Add node for __mfp8 type. (aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type. * config/aarch64/arm_neon.h (mfloat8_t): Add typedef. * config/aarch64/arm_sve.h (mfloat8_t): Likewise. gcc/testsuite/ChangeLog: * g++.target/aarch64/fp8_mangling.C: New tests exercising mangling. * g++.target/aarch64/fp8_scalar_typecheck_2.C: New tests in C++. * gcc.target/aarch64/fp8_scalar_1.c: New tests in C. * gcc.target/aarch64/fp8_scalar_typecheck_1.c: Likewise. --- Hi, Is this ok for master? I do not have commit rights yet, if ok, can someone commit it on my behalf? Regression tested with aarch64-unknown-linux-gnu. Compared to V1 of the patch, in version 2: - mangling for the __mfp8 type was added along with tests - unneeded comments were removed - simplified type checks in hooks - simplified initialization of aarch64_mfp8_type_node - separated mfloat8_t define from other fp types in arm_sve.h - C++ tests were moved to g++.target/aarch64 - added more tests around binary operations, function declaration, type traits - added tests exercising loads and stores from floating point registers Thanks, Claudio Bantaloukas gcc/config/aarch64/aarch64-builtins.cc| 20 + gcc/config/aarch64/aarch64.cc | 54 ++- gcc/config/aarch64/aarch64.h | 5 + gcc/config/aarch64/arm_neon.h | 2 + gcc/config/aarch64/arm_sve.h | 2 + .../g++.target/aarch64/fp8_mangling.C | 44 ++ .../aarch64/fp8_scalar_typecheck_2.C | 381 ++ .../gcc.target/aarch64/fp8_scalar_1.c | 134 ++ .../aarch64/fp8_scalar_typecheck_1.c | 356 9 files changed, 996 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.target/aarch64/fp8_mangling.C create mode 100644 gcc/testsuite/g++.target/aarch64/fp8_scalar_typecheck_2.C create mode 100644 gcc/testsuite/gcc.target/aarch64/fp8_scalar_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/fp8_scalar_typecheck_1.c diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index eb878b933fe..7d17df05a0f 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -961,6 +961,11 @@ static GTY(()) tree aarch64_simd_intOI_type_node = NULL_TREE; static GTY(()) tree aarch64_simd_intCI_type_node = NULL_TREE; static GTY(()) tree aarch64_simd_intXI_type_node = NULL_TREE; +/* The user-visible __mfp8 type, and a pointer to that type. Used + across the back-end. */ +tree aarch64_mfp8_type_node = NULL_TREE; +tree aarch64_mfp8_ptr_type_node = NULL_TREE; + /* The user-visible __fp16 type, and a pointer to that type. Used across the back-end. */ tree aarch64_fp16_type_node = NULL_TREE; @@ -1721,6 +1726,19 @@ aarch64_init_builtin_rsqrt (void) } } +/* Initialize the backend type that supports the user-visible __mfp8 + type and its relative pointer type. */ + +static void +aarch64_init_fp8_types (void) +{ + aarch64_mfp8_type_node = make_unsigned_type (8); + SET_TYPE_MODE (aarch64_mfp8_type_node, QImode); + + lang_hooks.types.register_builtin_type (aarch64_mfp8_type_node, "__mfp8"); + aarch64_mfp8_ptr_type_node = build_pointer_type (aarch64_mfp8_type_node); +} + /* Initialize the backend types that support the user-visible __fp16 type, also initialize a pointer to that type, to be used when fo
[PATCH] RISC-V: testsuite: Fix SELECT_VL SLP fallout.
Hi, this fixes asm-scan fallout from r15-3712-g5e3a4a01785e2d where we allow SLP with SELECT_VL. Assisted by sed and regtested on rv64gcv_zvfh_zvbb. Rather lengthy but obvious, so going to commit after a while if the CI is happy. I think those tests don't really need to check for vsetvl anyway, not all of them at least but I didn't change that for now. Regards Robin gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: Expect length-controlled loop. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-25.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-26.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-27.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-28.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-29.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-30.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-31.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-32.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-33.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/
Re: [PATCH v2] aarch64: Add fp8 scalar types
On 9/19/2024 2:18 PM, Kyrylo Tkachov wrote: Hi Claudio, On 19 Sep 2024, at 15:09, Claudio Bantaloukas wrote: External email: Use caution opening links or attachments The ACLE defines a new scalar type, __mfp8. This is an opaque 8bit types that can only be used by fp8 intrinsics. Additionally, the mfloat8_t type is made available in arm_neon.h and arm_sve.h as an alias of the same. This implementation uses an unsigned INTEGER_TYPE, with precision 8 to represent __mfp8. Conversions to int and other types are disabled via the TARGET_INVALID_CONVERSION hook. Additionally, operations that are typically available to integer types are disabled via TARGET_INVALID_UNARY_OP and TARGET_INVALID_BINARY_OP hooks. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (aarch64_mfp8_type_node): Add node for __mfp8 type. (aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type. (aarch64_init_fp8_types): New function to initialise fp8 types and register with language backends. * config/aarch64/aarch64.cc (aarch64_mangle_type): Add ABI mangling for new type. (aarch64_invalid_conversion): Add function implementing TARGET_INVALID_CONVERSION hook that blocks conversion to and from the __mfp8 type. (aarch64_invalid_unary_op): Add function implementing TARGET_UNARY_OP hook that blocks operations on __mfp8 other than &. (aarch64_invalid_binary_op): Extend TARGET_BINARY_OP hook to disallow operations on __mfp8 type. (TARGET_INVALID_CONVERSION): Add define. (TARGET_INVALID_UNARY_OP): Likewise. * config/aarch64/aarch64.h (aarch64_mfp8_type_node): Add node for __mfp8 type. (aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type. * config/aarch64/arm_neon.h (mfloat8_t): Add typedef. * config/aarch64/arm_sve.h (mfloat8_t): Likewise. Looks like this typedef is a good candidate to go into arm_private_fp8.h so that arm_neon.h, arm_sve.h and arm_sme.h inherit it. Hi Kyrill, thanks for the quick review. The thought of using arm_private_fp8.h crossed my mind but I thought that ultimately it made more sense to follow existing practice and place the typedef near existing ones for bfloat types. If you feel strongly about this, I'll make the suggested change, but I'd rather keep it as is. As you can see, the rest of the patch borrows heavily in style from the bfloat implementation and my hope is that the closeness in code will aid in maintainability. Let me know :) Cheers, Claudio Thanks, Kyrill gcc/testsuite/ChangeLog: * g++.target/aarch64/fp8_mangling.C: New tests exercising mangling. * g++.target/aarch64/fp8_scalar_typecheck_2.C: New tests in C++. * gcc.target/aarch64/fp8_scalar_1.c: New tests in C. * gcc.target/aarch64/fp8_scalar_typecheck_1.c: Likewise. --- Hi, Is this ok for master? I do not have commit rights yet, if ok, can someone commit it on my behalf? Regression tested with aarch64-unknown-linux-gnu. Compared to V1 of the patch, in version 2: - mangling for the __mfp8 type was added along with tests - unneeded comments were removed - simplified type checks in hooks - simplified initialization of aarch64_mfp8_type_node - separated mfloat8_t define from other fp types in arm_sve.h - C++ tests were moved to g++.target/aarch64 - added more tests around binary operations, function declaration, type traits - added tests exercising loads and stores from floating point registers Thanks, Claudio Bantaloukas gcc/config/aarch64/aarch64-builtins.cc| 20 + gcc/config/aarch64/aarch64.cc | 54 ++- gcc/config/aarch64/aarch64.h | 5 + gcc/config/aarch64/arm_neon.h | 2 + gcc/config/aarch64/arm_sve.h | 2 + .../g++.target/aarch64/fp8_mangling.C | 44 ++ .../aarch64/fp8_scalar_typecheck_2.C | 381 ++ .../gcc.target/aarch64/fp8_scalar_1.c | 134 ++ .../aarch64/fp8_scalar_typecheck_1.c | 356 9 files changed, 996 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.target/aarch64/fp8_mangling.C create mode 100644 gcc/testsuite/g++.target/aarch64/fp8_scalar_typecheck_2.C create mode 100644 gcc/testsuite/gcc.target/aarch64/fp8_scalar_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/fp8_scalar_typecheck_1.c diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index eb878b933fe..7d17df05a0f 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -961,6 +961,11 @@ static GTY(()) tree aarch64_simd_intOI_type_node = NULL_TREE; static GTY(()) tree aarch64_simd_intCI_type_node = NULL_TREE; static GTY(()) tree aarch64_simd_intXI_type_node = NULL_TREE; +/* The user-visible __mfp8 type, and a pointer to that type. Used + across the back-end. */ +tree aarch64_mfp8_
Re: [PATCH v5 4/4] RISC-V: Fix vector SAT_ADD dump check due to middle-end change
On 9/19/24 4:11 AM, Li, Pan2 wrote: So for the future I'd suggest you post those with a remark that you think they're obvious and going to commit in a day (or some other reasonable timeframe) if there are no complaints. Oh, I see. Thanks Robin for reminding. That would be perfect. Do you have any best practices for the remark "obvious"? Like [NFC] in subject to give some hit for not-function-change, maybe take [TBO] stand for to-be-obvious or something like that. Typically we say something like "pushing as obvious". jeff
Re: [PATCH] s390: Remove -m{,no-}lra option
Stefan Schulze Frielinghaus writes: > I have been missing the two test cases and removed them since they > depend on -mno-lra. Can't approve but it looks right. Thanks for handling it, especially so quickly! > > -- 8< -- > > Since the old reload pass is about to be removed and we defaulted to LRA > for over a decade, remove option -m{,no-}lra. > > PR target/113953 > > gcc/ChangeLog: > > * config/s390/s390.cc (s390_lra_p): Remove. > (TARGET_LRA_P): Remove. > * config/s390/s390.opt (mlra): Remove. > * config/s390/s390.opt.urls (mlra): Remove. > > gcc/testsuite/ChangeLog: > > * gcc.target/s390/TI-constants-nolra.c: Removed. > * gcc.target/s390/pr79895.c: Removed. > --- > gcc/config/s390/s390.cc | 10 > gcc/config/s390/s390.opt | 4 -- > gcc/config/s390/s390.opt.urls | 2 - > .../gcc.target/s390/TI-constants-nolra.c | 47 --- > gcc/testsuite/gcc.target/s390/pr79895.c | 9 > 5 files changed, 72 deletions(-) > delete mode 100644 gcc/testsuite/gcc.target/s390/TI-constants-nolra.c > delete mode 100644 gcc/testsuite/gcc.target/s390/pr79895.c > > diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc > index c9172d1153a..25d43ae3e13 100644 > --- a/gcc/config/s390/s390.cc > +++ b/gcc/config/s390/s390.cc > @@ -11342,13 +11342,6 @@ s390_can_change_mode_class (machine_mode from_mode, >return true; > } > > -/* Return true if we use LRA instead of reload pass. */ > -static bool > -s390_lra_p (void) > -{ > - return s390_lra_flag; > -} > - > /* Return true if register FROM can be eliminated via register TO. */ > > static bool > @@ -18444,9 +18437,6 @@ s390_c_mode_for_floating_type (enum tree_index ti) > #undef TARGET_LEGITIMATE_CONSTANT_P > #define TARGET_LEGITIMATE_CONSTANT_P s390_legitimate_constant_p > > -#undef TARGET_LRA_P > -#define TARGET_LRA_P s390_lra_p > - > #undef TARGET_CAN_ELIMINATE > #define TARGET_CAN_ELIMINATE s390_can_eliminate > > diff --git a/gcc/config/s390/s390.opt b/gcc/config/s390/s390.opt > index a5b5aa95a12..23ea4b8232d 100644 > --- a/gcc/config/s390/s390.opt > +++ b/gcc/config/s390/s390.opt > @@ -229,10 +229,6 @@ Set the branch costs for conditional branch > instructions. Reasonable > values are small, non-negative integers. The default branch cost is > 1. > > -mlra > -Target Var(s390_lra_flag) Init(1) Save > -Use LRA instead of reload. > - > mpic-data-is-text-relative > Target Var(s390_pic_data_is_text_relative) > Init(TARGET_DEFAULT_PIC_DATA_IS_TEXT_RELATIVE) > Assume data segments are relative to text segment. > diff --git a/gcc/config/s390/s390.opt.urls b/gcc/config/s390/s390.opt.urls > index ab1e761efa8..bc772d2ffc7 100644 > --- a/gcc/config/s390/s390.opt.urls > +++ b/gcc/config/s390/s390.opt.urls > @@ -74,8 +74,6 @@ > UrlSuffix(gcc/S_002f390-and-zSeries-Options.html#index-mzarch) > > ; skipping UrlSuffix for 'mbranch-cost=' due to finding no URLs > > -; skipping UrlSuffix for 'mlra' due to finding no URLs > - > ; skipping UrlSuffix for 'mpic-data-is-text-relative' due to finding no URLs > > ; skipping UrlSuffix for 'mindirect-branch=' due to finding no URLs > diff --git a/gcc/testsuite/gcc.target/s390/TI-constants-nolra.c > b/gcc/testsuite/gcc.target/s390/TI-constants-nolra.c > deleted file mode 100644 > index b9948fc4aa5..000 > --- a/gcc/testsuite/gcc.target/s390/TI-constants-nolra.c > +++ /dev/null > @@ -1,47 +0,0 @@ > -/* { dg-do compile { target int128 } } */ > -/* { dg-options "-O3 -mno-lra" } */ > - > -/* 2x lghi */ > -__int128 a() { > - return 0; > -} > - > -/* 2x lghi */ > -__int128 b() { > - return -1; > -} > - > -/* 2x lghi */ > -__int128 c() { > - return -2; > -} > - > -/* lghi + llilh */ > -__int128 d() { > - return 16000 << 16; > -} > - > -/* lghi + llihf */ > -__int128 e() { > - return (unsigned long long)8 << 32; > -} > - > -/* lghi + llihf */ > -__int128 f() { > - return (unsigned __int128)8 << 96; > -} > - > -/* llihf + llihf - this is handled via movti_bigconst pattern */ > -__int128 g() { > - return ((unsigned __int128)8 << 96) | ((unsigned __int128)8 << 32); > -} > - > -/* Literal pool */ > -__int128 h() { > - return ((unsigned __int128)8 << 32) | 1; > -} > - > -/* Literal pool */ > -__int128 i() { > - return (((unsigned __int128)8 << 32) | 1) << 64; > -} > diff --git a/gcc/testsuite/gcc.target/s390/pr79895.c > b/gcc/testsuite/gcc.target/s390/pr79895.c > deleted file mode 100644 > index 02374e4b8a8..000 > --- a/gcc/testsuite/gcc.target/s390/pr79895.c > +++ /dev/null > @@ -1,9 +0,0 @@ > -/* { dg-do compile { target int128 } } */ > -/* { dg-options "-O1 -mno-lra" } */ > - > -unsigned __int128 g; > -void > -foo () > -{ > - g = (unsigned __int128)1 << 127; > -}
Re: [patch, fortran] Add random numbers and fix some bugs.
Am 19.09.24 um 12:16 schrieb Andre Vehreschild: Hi Thomas, submitting your patch as part of the mail got it corrupted by some mailer adding line breaks. It does not apply for me. Because I can't test it, I have more questions, see below: I have attached it. On Wed, 18 Sep 2024 22:22:15 +0200 Thomas Koenig wrote: This patch adds random number support for UNSIGNED, plus fixes two bugs, with array I/O where the type used to be set to BT_INTEGER, and for division with the divisor being a constant. Again, depends on prevous submissions. OK for trunk? gcc/fortran/ChangeLog: * check.cc (gfc_check_random_number): Adjust for unsigned. * iresolve.cc (gfc_resolve_random_number): Handle unsinged. Hihi, I do this typo, too, over and over again: s/unsinged/unsigned/ Yep :-) It's like it is burned into my fingers or something. * trans-expr.cc (gfc_conv_expr_op): Handle BT_UNSIGNED for divide. * trans-types.cc (gfc_get_dtype_rank_type): Handle BT_UNSIGNED. * gfortran.texi: Add RANDOM_NUMBER for UNSIGNED. diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc index 533c9d7d343..1851cfb8d4a 100644 --- a/gcc/fortran/check.cc +++ b/gcc/fortran/check.cc @@ -7007,8 +7007,14 @@ gfc_check_random_init (gfc_expr *repeatable, gfc_expr *image_distinct) bool gfc_check_random_number (gfc_expr *harvest) { - if (!type_check (harvest, 0, BT_REAL)) -return false; + if (flag_unsigned) +{ + if (!type_check2 (harvest, 0, BT_REAL, BT_UNSIGNED)) + return false; When the second argument is a BT_INTEGER, does this fail here? As it should. RANDOM_NUMBER usually is for REALs only. I thought it an obvious idea to extend it to unsigned integers, but only got the idea after the document was finalized, so I'm implementing it anyway. +} + else +if (!type_check (harvest, 0, BT_REAL)) + return false; if (!variable_check (harvest, 0, false)) return false; Best regards Thomas From 898be1e536614f6a8eb2cb59c3dbcd8277922d8f Mon Sep 17 00:00:00 2001 From: Thomas Koenig Date: Wed, 18 Sep 2024 22:02:03 +0200 Subject: [PATCH 2/2] Add random numbers and fix some bugs. This patch adds random number support for UNSIGNED, plus fixes two bugs, with array I/O where the type used to be set to BT_INTEGER, and for division with the divisor being a constant. gcc/fortran/ChangeLog: * check.cc (gfc_check_random_number): Adjust for unsigned. * iresolve.cc (gfc_resolve_random_number): Handle unsinged. * trans-expr.cc (gfc_conv_expr_op): Handle BT_UNSIGNED for divide. * trans-types.cc (gfc_get_dtype_rank_type): Handle BT_UNSIGNED. * gfortran.texi: Add RANDOM_NUMBER for UNSIGNED. libgfortran/ChangeLog: * gfortran.map: Add _gfortran_random_m1, _gfortran_random_m2, _gfortran_random_m4, _gfortran_random_m8 and _gfortran_random_m16. * intrinsics/random.c (random_m1): New function. (random_m2): New function. (random_m4): New function. (random_m8): New function. (random_m16): New function. (arandom_m1): New function. (arandom_m2): New function. (arandom_m4): New function. (arandom_m8): New funciton. (arandom_m16): New function. gcc/testsuite/ChangeLog: * gfortran.dg/unsigned_30.f90: New test. --- gcc/fortran/check.cc | 10 +- gcc/fortran/gfortran.texi | 1 + gcc/fortran/iresolve.cc | 6 +- gcc/fortran/trans-expr.cc | 4 +- gcc/fortran/trans-types.cc| 7 +- gcc/testsuite/gfortran.dg/unsigned_30.f90 | 63 libgfortran/gfortran.map | 10 + libgfortran/intrinsics/random.c | 440 ++ 8 files changed, 534 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/unsigned_30.f90 diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc index 533c9d7d343..1851cfb8d4a 100644 --- a/gcc/fortran/check.cc +++ b/gcc/fortran/check.cc @@ -7007,8 +7007,14 @@ gfc_check_random_init (gfc_expr *repeatable, gfc_expr *image_distinct) bool gfc_check_random_number (gfc_expr *harvest) { - if (!type_check (harvest, 0, BT_REAL)) -return false; + if (flag_unsigned) +{ + if (!type_check2 (harvest, 0, BT_REAL, BT_UNSIGNED)) + return false; +} + else +if (!type_check (harvest, 0, BT_REAL)) + return false; if (!variable_check (harvest, 0, false)) return false; diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index 3eb8039c09f..a5ebadff3bb 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -2790,6 +2790,7 @@ As of now, the following intrinsics take unsigned arguments: @item @code{TRANSFER} @item @code{SUM}, @code{PRODUCT}, @code{MATMUL} and @code{DOT_PRODUCT} @item @code{IANY}, @code{IALL} and @code{IPARITY} +@item @code{RANDOM_NUMBER}. @end itemize This list will grow in the near future. @c - diff --git a/gcc/fortran/iresolve.cc b