Re: [PATCH] x86-64: Don't use temp for argument in a TImode register
On Fri, Sep 6, 2024 at 2:24 PM H.J. Lu wrote: > > Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression > in a TImode register. Otherwise, the TImode variable will be put in > the GPR save area which guarantees only 8-byte alignment. > > gcc/ > > PR target/116621 > * config/i386/i386.cc (ix86_gimplify_va_arg): Don't use temp for > a PARALLEL BLKmode container of an EXPR_LIST expression in a > TImode register. > > gcc/testsuite/ > > PR target/116621 > * gcc.target/i386/pr116621.c: New test. LGTM. Thanks, Uros. > > Signed-off-by: H.J. Lu > --- > gcc/config/i386/i386.cc | 22 ++-- > gcc/testsuite/gcc.target/i386/pr116621.c | 43 > 2 files changed, 63 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr116621.c > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index 707b75a6d5d..45320124b91 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -4908,13 +4908,31 @@ ix86_gimplify_va_arg (tree valist, tree type, > gimple_seq *pre_p, > >examine_argument (nat_mode, type, 0, &needed_intregs, &needed_sseregs); > > - need_temp = (!REG_P (container) > + bool container_in_reg = false; > + if (REG_P (container)) > + container_in_reg = true; > + else if (GET_CODE (container) == PARALLEL > + && GET_MODE (container) == BLKmode > + && XVECLEN (container, 0) == 1) > + { > + /* Check if it is a PARALLEL BLKmode container of an EXPR_LIST > +expression in a TImode register. In this case, temp isn't > +needed. Otherwise, the TImode variable will be put in the > +GPR save area which guarantees only 8-byte alignment. */ > + rtx x = XVECEXP (container, 0, 0); > + if (GET_CODE (x) == EXPR_LIST > + && REG_P (XEXP (x, 0)) > + && XEXP (x, 1) == const0_rtx) > + container_in_reg = true; > + } > + > + need_temp = (!container_in_reg >&& ((needed_intregs && TYPE_ALIGN (type) > 64) >|| TYPE_ALIGN (type) > 128)); > >/* In case we are passing structure, verify that it is consecutive > block > on the register save area. If not we need to do moves. */ > - if (!need_temp && !REG_P (container)) > + if (!need_temp && !container_in_reg) > { > /* Verify that all registers are strictly consecutive */ > if (SSE_REGNO_P (REGNO (XEXP (XVECEXP (container, 0, 0), 0 > diff --git a/gcc/testsuite/gcc.target/i386/pr116621.c > b/gcc/testsuite/gcc.target/i386/pr116621.c > new file mode 100644 > index 000..704266458a8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr116621.c > @@ -0,0 +1,43 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2" } */ > + > +#include > +#include > + > +union S8302 > +{ > + union > + { > +double b; > +int c; > + } a; > + long double d; > + unsigned short int f[5]; > +}; > + > +union S8302 s8302; > +extern void check8302va (int i, ...); > + > +int > +main (void) > +{ > + memset (&s8302, '\0', sizeof (s8302)); > + s8302.a.b = -221438.25; > + check8302va (1, s8302); > + return 0; > +} > + > +__attribute__((noinline, noclone)) > +void > +check8302va (int z, ...) > +{ > + union S8302 arg, *p; > + va_list ap; > + > + __builtin_va_start (ap, z); > + p = &s8302; > + arg = __builtin_va_arg (ap, union S8302); > + if (p->a.b != arg.a.b) > +__builtin_abort (); > + __builtin_va_end (ap); > +} > -- > 2.46.0 >
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
On Fri, Sep 6, 2024 at 10:50 PM Martin Uecker wrote: > > Am Freitag, dem 06.09.2024 um 13:59 -0700 schrieb Bill Wendling: > > On Fri, Sep 6, 2024 at 12:32 PM Martin Uecker wrote: > > > > > > Am Freitag, dem 06.09.2024 um 13:59 + schrieb Qing Zhao: > > > > > > > > > On Sep 5, 2024, at 18:22, Bill Wendling wrote: > > > > > > > > > > Hi Qing, > > > > > > > > > > Sorry for my late reply. > > > > > > > > > > On Thu, Aug 29, 2024 at 7:22 AM Qing Zhao > > > > > wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > Thanks for the information. > > > > > > > > > > > > Yes, providing a unary operator similar as __counted_by(PTR) as > > > > > > suggested by multiple people previously is a cleaner approach. > > > > > > > > > > > > Then the programmer will use the following: > > > > > > > > > > > > __builtin_choose_expr( > > > > > > __builtin_has_attribute (__p->FAM, "counted_by”) > > > > > > __builtin_get_counted_by(__p->FAM) = COUNT, 0); > > > > > > > > > > > > From the programmer’s point of view, it’s cleaner too. > > > > > > > > > > > > However, there is one issue with “__builtin_choose_expr” currently > > > > > > in GCC, its documentation explicitly mentions this limitation: > > > > > > (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fchoose_005fexpr) > > > > > > > > > > > > "Note: This construct is only available for C. Furthermore, the > > > > > > unused expression (exp1 or exp2 depending on the value of > > > > > > const_exp) may still generate syntax errors. This may change in > > > > > > future revisions.” > > > > > > > > > > > > So, due to this limitation, when there is no counted_by attribute, > > > > > > the __builtin_get_counted_by() still is evaluated by the compiler > > > > > > and errors is issued and the compilation stops, this can be show > > > > > > from the small testing case: > > > > > > > > > > > > [opc@qinzhao-ol8u3-x86 gcc]$ cat ttt.c > > > > > > > > > > > > struct flex { > > > > > > unsigned int b; > > > > > > int c[]; > > > > > > } *array_flex; > > > > > > > > > > > > #define MY_ALLOC(P, FAM, COUNT) ({ \ > > > > > > typeof(P) __p; \ > > > > > > unsigned int __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \ > > > > > > __p = (typeof(P)) __builtin_malloc(__size); \ > > > > > > __builtin_choose_expr( \ > > > > > >__builtin_has_attribute (__p->FAM, counted_by), \ > > > > > >__builtin_counted_by_ref(__p->FAM) = COUNT, 0); \ > > > > > > P = __p; \ > > > > > > }) > > > > > > > > > > > > int main(int argc, char *argv[]) > > > > > > { > > > > > > MY_ALLOC(array_flex, c, 20); > > > > > > return 0; > > > > > > } > > > > > > [opc@qinzhao-ol8u3-x86 gcc]$ sh t > > > > > > ttt.c: In function ‘main’: > > > > > > ttt.c:13:5: error: the argument must have ‘counted_by’ attribute > > > > > > ‘__builtin_counted_by_ref’ > > > > > > ttt.c:19:3: note: in expansion of macro ‘MY_ALLOC’ > > > > > > > > > > > > I checked the FE code on handling “__buiiltin_choose_expr”, Yes, it > > > > > > does parse the __builtin_counted_by_ref(__p->FAM) even when > > > > > > __builtin_has_attribute(__p->FAM, counted_by) is FALSE, and issued > > > > > > the error when parsing __builtin_counted_by_ref and stopped the > > > > > > compilation. > > > > > > > > > > > > So, in order to support this approach, we first must fix the issue > > > > > > in the current __builtin_choose_expr in GCC. Otherwise, it’s > > > > > > impossible for the user to use this new builtin. > > > > > > > > > > > > Let me know your comments and suggestions. > > > > > > > > > > > Do you need to emit a diagnostic if the FAM doesn't have the > > > > > counted_by attribute? It was originally supposed to "silently fail" if > > > > > it didn't. We may need to do the same for Clang if so. > > > > > > > > Yes, “silently fail” should workaround this problem if fixing the issue > > > > in the current __builtin_choose_expr is too complicate. > > > > > > > > I will study a little bit on how to fix the issue in > > > > __builtin_choose_expr first. > > > > > > > > Martin and Joseph, any comment or suggestion from you? > > > > > > My recommendation would be not to change __builtin_choose_expr. > > > > > > The design where __builtin_get_counted_by returns a null > > > pointer constant (void*)0 seems good. Most users will > > > get an error which I think is what we want and for those > > > that want it to work even if the attribute is not there, the > > > following code seems perfectly acceptable to me: > > > > > > auto p = __builtin_get_counted_by(__p->FAM) > > > *_Generic(p, void*: &(int){}, default: p) = 1; > > > > > > > > > Kees also seemed happy with it. And if I understood it correctly, > > > also Clang's bounds checking people can work with this. > > > > > The problem with this is explained in the Clang RFC [1]. Apple's team > > rejects taking the address of the 'counter' field when using > > -fbounds-safety. They suggested this as an alternative: > > > > __builtin_bounds_attr_a
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
On Fri, Sep 6, 2024 at 5:12 PM Qing Zhao wrote: > > Now, if > > 1. __builtin_get_counted_by should return a LVALUE instead of a pointer > (required by CLANG’s design) > And > 2. It’s better not to change the behavior of __builtin_choose_expr. > > Then the solution left is: > > __builtin_get_counted_by (p->FAM) returns a LVALUE as p->COUNT if p->FAM has > a counted_by attribute, if p->FAM does not have a counted_by attribute, > silently do nothing. (Or just issue warning if Linux is OKEY with such > waning). > > Is this acceptable? > We might have to finagle things a bit better, as the compiler will probably be annoyed at whatever gets generated by the code when the 'counted_by' attribute isn't there. For example, do we implicitly change it into '*(void *)0 = COUNT;' and expect that the '__builtin_has_attribute' check was already done? Do we remove the assignment expression entirely? If this is all happening inside '__builtin_choose_expr' (or '_Generic'?) then it might not be too much of a concern. But if it's outside of that builtin, trouble? I don't rightly know the answers to those questions. I'm not a big fan of turning an assignment expression into a no-op outside of '__builtin_choose_expr'... Let me think about it some more. If all else fails and we need that warning, we could always add a `-Wno-` to Linux. They have a long history of hating compiler warnings, even in cases where they explicitly *ask* for the warnings, so it's not likely to run into a NACK or anything... > thanks. > > Qing > > On Sep 6, 2024, at 16:59, Bill Wendling wrote: > > > > On Fri, Sep 6, 2024 at 12:32 PM Martin Uecker wrote: > >> > >> Am Freitag, dem 06.09.2024 um 13:59 + schrieb Qing Zhao: > >>> > On Sep 5, 2024, at 18:22, Bill Wendling wrote: > > Hi Qing, > > Sorry for my late reply. > > On Thu, Aug 29, 2024 at 7:22 AM Qing Zhao wrote: > > > > Hi, > > > > Thanks for the information. > > > > Yes, providing a unary operator similar as __counted_by(PTR) as > > suggested by multiple people previously is a cleaner approach. > > > > Then the programmer will use the following: > > > > __builtin_choose_expr( > >__builtin_has_attribute (__p->FAM, "counted_by”) > >__builtin_get_counted_by(__p->FAM) = COUNT, 0); > > > > From the programmer’s point of view, it’s cleaner too. > > > > However, there is one issue with “__builtin_choose_expr” currently in > > GCC, its documentation explicitly mentions this limitation: > > (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fchoose_005fexpr) > > > > "Note: This construct is only available for C. Furthermore, the unused > > expression (exp1 or exp2 depending on the value of const_exp) may still > > generate syntax errors. This may change in future revisions.” > > > > So, due to this limitation, when there is no counted_by attribute, the > > __builtin_get_counted_by() still is evaluated by the compiler and > > errors is issued and the compilation stops, this can be show from the > > small testing case: > > > > [opc@qinzhao-ol8u3-x86 gcc]$ cat ttt.c > > > > struct flex { > > unsigned int b; > > int c[]; > > } *array_flex; > > > > #define MY_ALLOC(P, FAM, COUNT) ({ \ > > typeof(P) __p; \ > > unsigned int __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \ > > __p = (typeof(P)) __builtin_malloc(__size); \ > > __builtin_choose_expr( \ > > __builtin_has_attribute (__p->FAM, counted_by), \ > > __builtin_counted_by_ref(__p->FAM) = COUNT, 0); \ > > P = __p; \ > > }) > > > > int main(int argc, char *argv[]) > > { > > MY_ALLOC(array_flex, c, 20); > > return 0; > > } > > [opc@qinzhao-ol8u3-x86 gcc]$ sh t > > ttt.c: In function ‘main’: > > ttt.c:13:5: error: the argument must have ‘counted_by’ attribute > > ‘__builtin_counted_by_ref’ > > ttt.c:19:3: note: in expansion of macro ‘MY_ALLOC’ > > > > I checked the FE code on handling “__buiiltin_choose_expr”, Yes, it > > does parse the __builtin_counted_by_ref(__p->FAM) even when > > __builtin_has_attribute(__p->FAM, counted_by) is FALSE, and issued the > > error when parsing __builtin_counted_by_ref and stopped the compilation. > > > > So, in order to support this approach, we first must fix the issue in > > the current __builtin_choose_expr in GCC. Otherwise, it’s impossible > > for the user to use this new builtin. > > > > Let me know your comments and suggestions. > > > Do you need to emit a diagnostic if the FAM doesn't have the > counted_by attribute? It was originally supposed to "silently fail" if > it didn't. We may need to do the same for Clang if so. > >>> > >>> Yes, “silently fail” should workaround this problem if fixing the issue > >>> in the current __bui
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
On Fri, Sep 6, 2024 at 11:16 PM Martin Uecker wrote: > > Am Samstag, dem 07.09.2024 um 00:12 + schrieb Qing Zhao: > > Now, if > > > > 1. __builtin_get_counted_by should return a LVALUE instead of a pointer > > (required by CLANG’s design) > > And > > 2. It’s better not to change the behavior of __builtin_choose_expr. > > > > Then the solution left is: > > > > __builtin_get_counted_by (p->FAM) returns a LVALUE as p->COUNT if p->FAM > > has a counted_by attribute, if p->FAM does not have a counted_by attribute, > > silently do nothing. (Or just issue warning if Linux is OKEY with such > > waning). > > What does silently do nothing mean? > > /* do nothing */ = counter; > > will still fail to compile. So I guess you have something > else in mind? > > > The new _Generic selection also works if you return a > lvalue of type void: > > struct foo x; > _Generic(typeof(__counted_by(&x)), void: (int){ 0 }, > default: __counted_by(&x)) = 10; > > https://godbolt.org/z/41E3oj84o > > So why not do this then? > I'm for any solution that a.) works with the fewest changes needed to other builtins, and 2.) isn't super convoluted. The '_Generic' looks like a potential compromise. -bw
[PATCH] Testing: fail vectorization when not using SLP
While I know the effect on x86-64 the following patch post is to trigger CI for disabling non-SLP vectorization to look for arm/aarch64 and risc-v specific failure modes. * tree-vect-stmts.cc (vect_analyze_stmt): Fail vectorization for non-SLP stmts. --- gcc/tree-vect-stmts.cc | 4 1 file changed, 4 insertions(+) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index f6c5b7a7e87..cfb1066fc96 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -13351,6 +13351,10 @@ vect_analyze_stmt (vec_info *vinfo, return opt_result::success (); } + if (!node) +return opt_result::failure_at (stmt_info->stmt, + "needs non-SLP handling\n"); + ok = true; if (!bb_vinfo && (STMT_VINFO_RELEVANT_P (stmt_info) -- 2.43.0
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
Am Sonntag, dem 08.09.2024 um 02:26 -0700 schrieb Bill Wendling: > On Sat, Sep 7, 2024 at 12:21 AM Jakub Jelinek wrote: > > On Sat, Sep 07, 2024 at 07:50:29AM +0200, Martin Uecker wrote: > > > > 2. Apple's implementation supports expressions in the '__counted_by' > > > > attribute, thus the 'count' may be an R-value that can't have its > > > > address taken. > > > > > > > > [1] > > > > https://discourse.llvm.org/t/rfc-introducing-new-clang-builtin-builtin-get-counted-by/80836/ > > > > > > Yes, this would be a limitation. > > > > But then you really can't return an lvalue either, all you can return is an > > rvalue in that case. So just return a (void*)0 in that case. > > > The builtin will return an r-value regardless. The idea is that if the > counter can't be used as an l-value, we wouldn't add an implicit > r-value -> l-value cast (an internal Clang AST node). In the cases > where it can be used as an l-value, we add the implicit cast. I > imagine GCC has some kind of analogue to that? The '(void *)0' case > could be reserved only for when the builtin is used as an l-value but > the r-value can't be cast as an l-value... I do not know what a r-value -> l-value cast is or why Clang needs it. I guess it backconverts an r-value which came from a l-value to an lvalue... The issue with returning r-value when the counter expression is not an lvalue referring to a struct member is that the user would also need yet another new mechanism to detect this case at compile-time. Martin
Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]
Am Sonntag, dem 08.09.2024 um 02:09 -0700 schrieb Bill Wendling: > On Fri, Sep 6, 2024 at 10:50 PM Martin Uecker wrote: > > > > Am Freitag, dem 06.09.2024 um 13:59 -0700 schrieb Bill Wendling: > > > On Fri, Sep 6, 2024 at 12:32 PM Martin Uecker wrote: > > > > > > > ... > > > > > > > > My recommendation would be not to change __builtin_choose_expr. > > > > > > > > The design where __builtin_get_counted_by returns a null > > > > pointer constant (void*)0 seems good. Most users will > > > > get an error which I think is what we want and for those > > > > that want it to work even if the attribute is not there, the > > > > following code seems perfectly acceptable to me: > > > > > > > > auto p = __builtin_get_counted_by(__p->FAM) > > > > *_Generic(p, void*: &(int){}, default: p) = 1; > > > > > > > > > > > > Kees also seemed happy with it. And if I understood it correctly, > > > > also Clang's bounds checking people can work with this. > > > > > > > The problem with this is explained in the Clang RFC [1]. Apple's team > > > rejects taking the address of the 'counter' field when using > > > -fbounds-safety. They suggested this as an alternative: > > > > > > __builtin_bounds_attr_arg(ptr->FAM) = COUNT; > > > > > > The __builtin_bounds_attr_arg(ptr->FAM) is replaced by an L-value to > > > the 'ptr->count' field during SEMA, and life goes on as normal. There > > > are a few reasons for this: > > > > > > 1. They want to track the relationship between the FAM and the > > > counter so that one isn't modified without the other being modified. > > > Allowing the address to be taken makes that check vastly harder. > > > > In the thread it is pointed out that returning a pointer works > > too, and they would simply restrict passing the pointer elsewhere. > > > It's in rapidsna's initial reply titled "Taking the address of a count > variable is forbidden": > > > https://discourse.llvm.org/t/rfc-introducing-new-clang-builtin-builtin-get-counted-by/80836/2 > > I didn't see her retreating from that idea. The initial proposal already has this as "Alternative Design" and then there is this response to my comment: https://discourse.llvm.org/t/rfc-introducing-new-clang-builtin-builtin-get-counted-by/80836/27 which seems to imply that she is open to this idea. > > I can't see "Apple's team rejects" and "vastly harder" at the > > moment. > > > > > > > > 2. Apple's implementation supports expressions in the '__counted_by' > > > attribute, thus the 'count' may be an R-value that can't have its > > > address taken. > > > > > > [1] > > > https://discourse.llvm.org/t/rfc-introducing-new-clang-builtin-builtin-get-counted-by/80836/ > > > > Yes, this would be a limitation. > > > And not one that I'm particularly excited about (allowing for (nearly) > arbitrary expressions in the 'counted_by' attribute that is). And I do not see how returning an r-value can work with the intended use case for __builtin_get_counted_by() anyway, because this would break assignments. So I do not see any issue against returning a pointer and (void*)0 when there is no attribute and if the expression is a r-value. Martin
[patch,reload,v2] PR116326 Introduce RELOAD_ELIMINABLE_REGS
The reason for PR116326 is that LRA and reload require different ELIMINABLE_REGS for a multi-register frame pointer. As ELIMINABLE_REGS is used to initialize static const objects, it is not possible to make ELIMINABLE_REGS dependent on -mlra. It was also concluded that it is not desirable to adjust reload so that it behaves like LRA, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116326#c8 This patch adds a new macro RELOAD_ELIMINABLE_REGS that takes precedence over ELIMINABLE_REGS in reload1.cc. The patch was proposed by H.J. Lu and is only required for trunk. Johann -- reload1.cc: rtl-optimization/116326 - Use RELOAD_ELIMINABLE_REGS. The new macro is required because reload and LRA are using different representations for a multi-register frame pointer. As ELIMINABLE_REGS is used to initialize static const objects, it can't depend on -mlra. PR rtl-optimization/116326 gcc/ * reload1.cc (reg_eliminate_1): Initialize from RELOAD_ELIMINABLE_REGS if defined. * config/avr/avr.h (RELOAD_ELIMINABLE_REGS): Copy from ELIMINABLE_REGS. (ELIMINABLE_REGS): Don't mention sub-regnos of the frame pointer. gcc/testsuite/ * gcc.target/avr/torture/lra-pr116324.c: New test. * gcc.target/avr/torture/lra-pr116325.c: New test.reload1.cc: rtl-optimization/116326 - Use RELOAD_ELIMINABLE_REGS. The new macro is required because reload and LRA are using different representations for a multi-register frame pointer. As ELIMINABLE_REGS is used to initialize static const objects, it can't depend on -mlra. PR rtl-optimization/116326 gcc/ * reload1.cc (reg_eliminate_1): Initialize from RELOAD_ELIMINABLE_REGS if defined. * config/avr/avr.h (RELOAD_ELIMINABLE_REGS): Copy from ELIMINABLE_REGS. (ELIMINABLE_REGS): Don't mention sub-regnos of the frame pointer. gcc/testsuite/ * gcc.target/avr/torture/lra-pr116324.c: New test. * gcc.target/avr/torture/lra-pr116325.c: New test. diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h index 1cf4180e534..3fa2ee76c43 100644 --- a/gcc/config/avr/avr.h +++ b/gcc/config/avr/avr.h @@ -308,12 +308,19 @@ enum reg_class { #define STATIC_CHAIN_REGNUM ((AVR_TINY) ? 18 :2) -#define ELIMINABLE_REGS { \ +#define RELOAD_ELIMINABLE_REGS {\ { ARG_POINTER_REGNUM, STACK_POINTER_REGNUM }, \ { ARG_POINTER_REGNUM, FRAME_POINTER_REGNUM }, \ { FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM }, \ { FRAME_POINTER_REGNUM + 1, STACK_POINTER_REGNUM + 1 } } +#define ELIMINABLE_REGS \ + {\ +{ ARG_POINTER_REGNUM, STACK_POINTER_REGNUM }, \ +{ ARG_POINTER_REGNUM, FRAME_POINTER_REGNUM }, \ +{ FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM } \ + } + #define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET) \ OFFSET = avr_initial_elimination_offset (FROM, TO) diff --git a/gcc/reload1.cc b/gcc/reload1.cc index 2e059b09970..b0ae64e10b2 100644 --- a/gcc/reload1.cc +++ b/gcc/reload1.cc @@ -283,7 +283,13 @@ static const struct elim_table_1 const int to; } reg_eliminate_1[] = + // Reload and LRA don't agree on how a multi-register frame pointer + // is represented for elimination. See avr.h for a use case. +#ifdef RELOAD_ELIMINABLE_REGS + RELOAD_ELIMINABLE_REGS; +#else ELIMINABLE_REGS; +#endif #define NUM_ELIMINABLE_REGS ARRAY_SIZE (reg_eliminate_1) diff --git a/gcc/testsuite/gcc.target/avr/torture/lra-pr116324.c b/gcc/testsuite/gcc.target/avr/torture/lra-pr116324.c new file mode 100644 index 000..b54eb402d8b --- /dev/null +++ b/gcc/testsuite/gcc.target/avr/torture/lra-pr116324.c @@ -0,0 +1,86 @@ +/* { dg-options { -std=gnu99 } } */ + +void f7_clr (void *cc) +{ + __asm ("%~call __f7_clr_asm" :: "z" (cc) : "memory"); +} + +void* f7_copy (void *cc, const void *aa) +{ + extern void __f7_copy_asm (void); + __asm ("%~call __f7_copy_asm" :: "z" (cc), "x" (aa) : "memory"); + return cc; +} + +typedef _Bool bool; +typedef unsigned int uint16_t; +typedef unsigned char uint8_t; +typedef int int16_t; + +typedef struct f7_t +{ + union + { +struct +{ + uint8_t sign :1; + uint8_t reserved1 :1; + uint8_t is_nan :1; + uint8_t reserved2 :4; + uint8_t is_inf :1; +}; +uint8_t flags; + }; + + uint8_t mant[7]; + int16_t expo; +} f7_t; + + +static inline __attribute__((__always_inline__)) +void __f7_clr (f7_t *cc) +{ + extern void __f7_clr_asm (void); + __asm ("%~call %x[f]" + : + : [f] "i" (__f7_clr_asm), "z" (cc) + : "memory"); +} + +static inline __attribute__((__always_inline__)) +bool __f7_signbit (const f7_t *aa) +{ + return aa->flags & (1 << 0); +} + +static inline __attribute__((__always_inline__)) +int16_t sub_ssat16 (int16_t a, int16_t b) +{ + _Sat _Fract sa = __builtin_avr_rbits (a); + _Sat _Fract sb = __builtin_avr_rbits (b); + return __builtin_avr_bitsr (sa - sb); +} + +extern void _
RE: [PATCH] RISC-V: Fix ICE for rvv in lto
I see, I can reproduce this when build "-march=rv64gcv -mabi=lp64d -flto -O0 test.c -o test.elf". #include int main () { size_t vl = 8; vint32m1_t vs1 = {}; vint32m1_t vs2 = {}; vint32m1_t vd = __riscv_vadd_vv_i32m1(vs1, vs2, vl); return (int)&vd; } Pan -Original Message- From: Jin Ma Sent: Sunday, September 8, 2024 1:15 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jinma.cont...@gmail.com Subject: Re: [PATCH] RISC-V: Fix ICE for rvv in lto > > #include > > > > vint32m1_t foo(vint32m1_t vs1, vint32m1_t vs2, size_t vl) > > { > > return __riscv_vadd_vv_i32m1(vs1, vs2, vl); > > } > > To double confirm, you mean "riscv64-linux-gnu-gcc-14 -march=rv64gcv > -mabi=lp64d -flto -O0 tmp.c -c -S -o -" with above is able to reproduce this > ICE? > > Pan Not too accurate, please don't add "-S" or "-c", let the compilation go to the linker and try to generate the binary. The normal result of compilation should be to throw an error that the main function cannot be found, but unfortunately ICE appears. By the way, The gcc-14 in my environment is built on releases/gcc-14, I didn't download any compiled gcc. Of course, it is also possible that my local environment is broken, and I will check it again. BR Jin
[PATCH] configure: fix typos
Fix typos in comments. Signed-off-by: Andrew Kreimer --- configure| 2 +- configure.ac | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/configure b/configure index e9583f2ba0c..6a77d454fd5 100755 --- a/configure +++ b/configure @@ -9086,7 +9086,7 @@ if test -d ${srcdir}/gcc; then lang_requires_boot_languages= # set srcdir during sourcing lang_frag to the gcc dir. # Sadly overriding srcdir on the . line doesn't work in plain sh as it -# polutes this shell +# pollutes this shell saved_srcdir=${srcdir} srcdir=${srcdir}/gcc . ${lang_frag} srcdir=${saved_srcdir} diff --git a/configure.ac b/configure.ac index f61dbe64a94..2567757e74d 100644 --- a/configure.ac +++ b/configure.ac @@ -2136,7 +2136,7 @@ if test -d ${srcdir}/gcc; then lang_requires_boot_languages= # set srcdir during sourcing lang_frag to the gcc dir. # Sadly overriding srcdir on the . line doesn't work in plain sh as it -# polutes this shell +# pollutes this shell saved_srcdir=${srcdir} srcdir=${srcdir}/gcc . ${lang_frag} srcdir=${saved_srcdir} -- 2.46.0
Re: [PATCH v3 08/12] OpenMP: Reject other properties with kind(any)
On 8/16/24 06:58, Jakub Jelinek wrote: If this can apply (perhaps with small fuzz) to vanilla trunk, guess it can be committed right now, doesn't need to wait for the rest of the metadirective patch set. OK. I've tested the slightly cleaned-up version of the patch which is attached; I'll push it in the next day or two if there is no further objection. -SandraFrom 23a82bea26805f2a430354d56b444d5bb7eaed3f Mon Sep 17 00:00:00 2001 From: Sandra Loosemore Date: Fri, 6 Sep 2024 20:58:13 + Subject: [PATCH] OpenMP: Reject other properties with kind(any) TR13 (pre-6.0) of the OpenMP spec says: "If trait-property any is specified in the kind trait-selector of the device selector set or the target_device selector sets, no other trait-property may be specified in the same selector set." This restriction dates back to OpenMP 5.1 (where it had slightly different wording). GCC's implementation was based on the 5.0 spec, and several testcases include selectors that are now considered invalid. This patch adds a diagnostic and fixes the testcases. gcc/ChangeLog * omp-general.cc (omp_check_context_selector): Reject other properties in the same selector set with kind(any). gcc/testsuite/ChangeLog * c-c++-common/gomp/declare-variant-10.c: Fix broken tests. * c-c++-common/gomp/declare-variant-3.c: Likewise. * c-c++-common/gomp/declare-variant-9.c: Likewise. * c-c++-common/gomp/declare-variant-any.c: New. * gfortran.dg/gomp/declare-variant-10.f90: Fix broken tests. * gfortran.dg/gomp/declare-variant-3.f90: Likewise. * gfortran.dg/gomp/declare-variant-9.f90: Likewise. * gfortran.dg/gomp/declare-variant-any.f90: New. --- gcc/omp-general.cc| 35 +++ .../c-c++-common/gomp/declare-variant-10.c| 4 +-- .../c-c++-common/gomp/declare-variant-3.c | 10 ++ .../c-c++-common/gomp/declare-variant-9.c | 4 +-- .../c-c++-common/gomp/declare-variant-any.c | 10 ++ .../gfortran.dg/gomp/declare-variant-10.f90 | 4 +-- .../gfortran.dg/gomp/declare-variant-3.f90| 12 ++- .../gfortran.dg/gomp/declare-variant-9.f90| 2 +- .../gfortran.dg/gomp/declare-variant-any.f90 | 28 +++ 9 files changed, 86 insertions(+), 23 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-variant-any.c create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-variant-any.f90 diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index 0b61335dba4..aa53c00bec5 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -1288,6 +1288,8 @@ omp_check_context_selector (location_t loc, tree ctx) for (tree tss = ctx; tss; tss = TREE_CHAIN (tss)) { enum omp_tss_code tss_code = OMP_TSS_CODE (tss); + bool saw_any_prop = false; + bool saw_other_prop = false; /* We can parse this, but not handle it yet. */ if (tss_code == OMP_TRAIT_SET_TARGET_DEVICE) @@ -1324,6 +1326,31 @@ omp_check_context_selector (location_t loc, tree ctx) else ts_seen[ts_code] = true; + + /* If trait-property "any" is specified in the "kind" + trait-selector of the "device" selector set or the + "target_device" selector sets, no other trait-property + may be specified in the same selector set. */ + if (ts_code == OMP_TRAIT_DEVICE_KIND) + for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p)) + { + const char *prop = omp_context_name_list_prop (p); + if (!prop) + continue; + else if (strcmp (prop, "any") == 0) + saw_any_prop = true; + else + saw_other_prop = true; + } + /* It seems slightly suspicious that the spec's language covers + the device_num selector too, but + target_device={device_num(whatever),kind(any)} + is probably not terribly useful anyway. */ + else if (ts_code == OMP_TRAIT_DEVICE_ARCH + || ts_code == OMP_TRAIT_DEVICE_ISA + || ts_code == OMP_TRAIT_DEVICE_NUM) + saw_other_prop = true; + if (omp_ts_map[ts_code].valid_properties == NULL) continue; @@ -1376,6 +1403,14 @@ omp_check_context_selector (location_t loc, tree ctx) break; } } + + if (saw_any_prop && saw_other_prop) + { + error_at (loc, + "no other trait-property may be specified " + "in the same selector set with %"); + return error_mark_node; + } } return ctx; } diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-10.c b/gcc/testsuite/c-c++-common/gomp/declare-variant-10.c index 2b8a39425b1..e77693430d1 100644 --- a/gcc/testsuite/c-c++-common/gomp/declare-variant-10.c +++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-10.c @@ -7,7 +7,7 @@ void f01 (void); #pragma omp declare variant (f01) match (device={isa(avx512f,avx512bw)}) void f02 (void); void f03 (void); -#pragma omp declare variant (f03) match (device={kind("any"),arch(x86_64),isa(avx512f,avx512bw)}) +#pragma omp declare variant (f03) match (device={arch(x86_64),isa(avx512f,avx512bw)}) void f04 (void); void f05 (void); #pragma om
[PATCH] phiopt: Small refactoring/cleanup of non-ssa name case of factor_out_conditional_operation
This small cleanup removes a redundant check for gimple_assign_cast_p and reformats based on that. Also changes the if statement that checks if the integral type and the check to see if the constant fits into the new type such that it returns null and reformats based on that. Also moves the check for has_single_use earlier so it is less complex still a cheaper check than some of the others (like the check on the integer side). This was noticed when adding a few new things to factor_out_conditional_operation but those are not ready to submit yet. Note there are no functional difference with this change. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (factor_out_conditional_operation): Move the has_single_use checks much earlier. Remove redundant check for gimple_assign_cast_p. Change around the check if the integral consts fits into the new type. Signed-off-by: Andrew Pinski --- gcc/tree-ssa-phiopt.cc | 122 - 1 file changed, 60 insertions(+), 62 deletions(-) diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc index 271a5d51f09..06ec5875722 100644 --- a/gcc/tree-ssa-phiopt.cc +++ b/gcc/tree-ssa-phiopt.cc @@ -265,6 +265,11 @@ factor_out_conditional_operation (edge e0, edge e1, gphi *phi, tree new_arg0 = arg0_op.ops[0]; tree new_arg1; + /* If arg0 have > 1 use, then this transformation actually increases + the number of expressions evaluated at runtime. */ + if (!has_single_use (arg0)) +return NULL; + if (TREE_CODE (arg1) == SSA_NAME) { /* Check if arg1 is an SSA_NAME. */ @@ -278,6 +283,11 @@ factor_out_conditional_operation (edge e0, edge e1, gphi *phi, if (arg1_op.operands_occurs_in_abnormal_phi ()) return NULL; + /* If arg1 have > 1 use, then this transformation actually increases +the number of expressions evaluated at runtime. */ + if (!has_single_use (arg1)) + return NULL; + /* Either arg1_def_stmt or arg0_def_stmt should be conditional. */ if (dominated_by_p (CDI_DOMINATORS, gimple_bb (phi), gimple_bb (arg0_def_stmt)) && dominated_by_p (CDI_DOMINATORS, @@ -295,80 +305,68 @@ factor_out_conditional_operation (edge e0, edge e1, gphi *phi, if (dominated_by_p (CDI_DOMINATORS, gimple_bb (phi), gimple_bb (arg0_def_stmt))) return NULL; - /* If arg1 is an INTEGER_CST, fold it to new type. */ - if (INTEGRAL_TYPE_P (TREE_TYPE (new_arg0)) - && (int_fits_type_p (arg1, TREE_TYPE (new_arg0)) - || (TYPE_PRECISION (TREE_TYPE (new_arg0)) + /* Only handle if arg1 is a INTEGER_CST and one that fits +into the new type or if it is the same precision. */ + if (!INTEGRAL_TYPE_P (TREE_TYPE (new_arg0)) + || !(int_fits_type_p (arg1, TREE_TYPE (new_arg0)) + || (TYPE_PRECISION (TREE_TYPE (new_arg0)) == TYPE_PRECISION (TREE_TYPE (arg1) + return NULL; + + /* For the INTEGER_CST case, we are just moving the +conversion from one place to another, which can often +hurt as the conversion moves further away from the +statement that computes the value. So, perform this +only if new_arg0 is an operand of COND_STMT, or +if arg0_def_stmt is the only non-debug stmt in +its basic block, because then it is possible this +could enable further optimizations (minmax replacement +etc.). See PR71016. +Note no-op conversions don't have this issue as +it will not generate any zero/sign extend in that case. */ + if ((TYPE_PRECISION (TREE_TYPE (new_arg0)) + != TYPE_PRECISION (TREE_TYPE (arg1))) + && new_arg0 != gimple_cond_lhs (cond_stmt) + && new_arg0 != gimple_cond_rhs (cond_stmt) + && gimple_bb (arg0_def_stmt) == e0->src) { - if (gimple_assign_cast_p (arg0_def_stmt)) + gsi = gsi_for_stmt (arg0_def_stmt); + gsi_prev_nondebug (&gsi); + if (!gsi_end_p (gsi)) { - /* For the INTEGER_CST case, we are just moving the -conversion from one place to another, which can often -hurt as the conversion moves further away from the -statement that computes the value. So, perform this -only if new_arg0 is an operand of COND_STMT, or -if arg0_def_stmt is the only non-debug stmt in -its basic block, because then it is possible this -could enable further optimizations (minmax replacement -etc.). See PR71016. -Note no-op conversions don't have this issue as -it will not generate any zero/sign extend in that case. */ - if ((TYPE_PRECISION (TREE_TYPE (new_arg0)) - != TYPE_PRECISION (TREE_TYPE (arg1))) - && new_arg0 != gimple_cond_lhs (cond_stm
Re: [PATCH 1/3] gcov: Cache source files
On 8/23/24 09:39, Jan Hubicka wrote: Hi, 1:4:int notmain(const char *entity) -: == inlined from hello.h == 1:6: if (s) branch 0 taken 0 (fallthrough) branch 1 taken 1 #:7:printf ("hello, %s!\n", s); %:7-block 3 call0 never executed -:8: else 1:9:printf ("hello, world!\n"); 1:9-block 4 call0 returned 1 1: 10: return 0; 1: 10-block 5 -: == inlined from hello.h (end) == -:5:{ 1:6: return hello (entity); 1:6-block 7 -:7:} This indeed looks like a reasonable goal. Thanks, pushed. gcc/ChangeLog: * gcov.cc (release_structures): Release source_lines. (slurp): New function. (output_lines): Read sources with slurp. --- gcc/gcov.cc | 70 - 1 file changed, 53 insertions(+), 17 deletions(-) diff --git a/gcc/gcov.cc b/gcc/gcov.cc index e76a314041c..19019f404ee 100644 --- a/gcc/gcov.cc +++ b/gcc/gcov.cc @@ -550,6 +550,11 @@ static vector names; a file being read multiple times. */ static vector processed_files; +/* The contents of a source file. The nth SOURCE_LINES entry is the + contents of the nth SOURCES, or empty if it has not or could not be + read. */ +static vector*> source_lines; + /* This holds data summary information. */ static unsigned object_runs; @@ -762,6 +767,8 @@ static string make_gcov_file_name (const char *, const char *); static char *mangle_name (const char *); static void release_structures (void); extern int main (int, char **); +static const vector& +slurp (const source_info &src, FILE *gcov_file, const char *line_start); function_info::function_info (): m_name (NULL), m_demangled_name (NULL), ident (0), lineno_checksum (0), cfg_checksum (0), has_catch (0), @@ -1804,6 +1811,15 @@ release_structures (void) it != functions.end (); it++) delete (*it); + for (vector *lines : source_lines) +{ + if (lines) + for (const char *line : *lines) + free (const_cast (line)); + delete (lines); +} + source_lines.resize (0); + for (fnfilter &filter : filters) regfree (&filter.regex); @@ -3246,6 +3262,41 @@ read_line (FILE *file) return pos ? string : NULL; } +/* Get the vector with the contents SRC, possibly from a cache. If + the reading fails, a message prefixed with LINE_START is written to + GCOV_FILE. */ +static const vector& +slurp (const source_info &src, FILE *gcov_file, + const char *line_start) +{ + if (source_lines.size () <= src.index) +source_lines.resize (src.index + 1); + + /* Store vector pointers so that the returned references remain + stable and won't be broken by successive calls to slurp. */ + if (!source_lines[src.index]) +source_lines[src.index] = new vector (); + + if (!source_lines[src.index]->empty ()) +return *source_lines[src.index]; + + FILE *source_file = fopen (src.name, "r"); + if (!source_file) +fnotice (stderr, "Cannot open source file %s\n", src.name); + else if (src.file_time == 0) +fprintf (gcov_file, "%sSource is newer than graph\n", line_start); + + const char *retval; + vector &lines = *source_lines[src.index]; + if (source_file) +while ((retval = read_line (source_file))) + lines.push_back (xstrdup (retval)); + + if (source_file) +fclose (source_file); + return lines; So this is basically going to read all sources needed for single compilation unit (.cc file) into memory at once. I guess that is OK. I wonder if we don't want to do something like mmap the source and then set up source_lines array instead of reading every like into spearate xstrduped chunk? But that could be done incrementally, so th epatch is OK. Honza +} + /* Pad string S with spaces from left to have total width equal to 9. */ static void @@ -3435,9 +3486,6 @@ output_lines (FILE *gcov_file, const source_info *src) #define DEFAULT_LINE_START "-:0:" #define FN_SEPARATOR "--\n" - FILE *source_file; - const char *retval; - /* Print colorization legend. */ if (flag_use_colors) fprintf (gcov_file, "%s", @@ -3464,17 +3512,8 @@ output_lines (FILE *gcov_file, const source_info *src) fprintf (gcov_file, DEFAULT_LINE_START "Runs:%u\n", object_runs); } - source_file = fopen (src->name, "r"); - if (!source_file) -fnotice (stderr, "Cannot open source file %s\n", src->name); - else if (src->file_time == 0) -fprintf (gcov_file, DEFAULT_LINE_START "Source is newer than graph\n"); - - vector source_lines; - if (source_file) -while ((retval = read_line (source_file)) != NULL) - source_lines.push_back (xstrdup (retval)); - + const vector &source_lines = slurp (*src, gcov_file, + DEFAULT_LINE_START);
Re: [PATCH 1/4] testsuite: Use dg-compile, not gcc -c
On 8/23/24 09:41, Jan Hubicka wrote: Since this is a pure compile test it makes sense to inform dejagnu of it. gcc/testsuite/ChangeLog: * gcc.misc-tests/gcov-23.c: Use dg-compile, not gcc -c OK, Honza Thanks, Pushed. --- gcc/testsuite/gcc.misc-tests/gcov-23.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.misc-tests/gcov-23.c b/gcc/testsuite/gcc.misc-tests/gcov-23.c index 72849d80e3a..72ba0aa1389 100644 --- a/gcc/testsuite/gcc.misc-tests/gcov-23.c +++ b/gcc/testsuite/gcc.misc-tests/gcov-23.c @@ -1,4 +1,5 @@ -/* { dg-options "-fcondition-coverage -ftest-coverage -O2 -c" } */ +/* { dg-options "-fcondition-coverage -ftest-coverage -O2" } */ +/* { dg-do compile } */ #include #include -- 2.39.2
[patch, fortran] Matmul and dot_product for unsigned
Hello world, like the subject says. The patch is gzipped because it is large; it contains multiple MATMUL library implementations. OK for trunk? Implement MATMUL and DOT_PRODUCT for unsgigned. gcc/fortran/ChangeLog: * arith.cc (gfc_arith_uminus): Fix warning. (gfc_arith_minus): Correctly truncate unsigneds. * check.cc (gfc_check_dot_product): Handle unsigned arguments. (gfc_check_matmul): Likewise. * expr.cc (gfc_get_unsigned_expr): New function. * gfortran.h (gfc_get_unsigned_expr): Add prototype. * gfortran.texi: Document MATMUL and DOT_PRODUCT for unsigned. * simplify.cc (compute_dot_product): Handle unsigneds. libgfortran/ChangeLog: * Makefile.am: Add unsigned MATMUL versions. * Makefile.in: Regenerate. * gfortran.map: Add unsigned MATMUL versions.o * libgfortran.h: Add array types for unsigned. * m4/iparm.m4: Add UNSIGNED if type is m. * generated/matmul_m1.c: New file. * generated/matmul_m16.c: New file. * generated/matmul_m2.c: New file. * generated/matmul_m4.c: New file. * generated/matmul_m8.c: New file. * generated/matmulavx128_m1.c: New file. * generated/matmulavx128_m16.c: New file. * generated/matmulavx128_m2.c: New file. * generated/matmulavx128_m4.c: New file. * generated/matmulavx128_m8.c: New file. gcc/testsuite/ChangeLog: * gfortran.dg/unsigned_25.f90: New test. * gfortran.dg/unsigned_26.f90: New test. 0001-Implement-MATMUL-and-DOT_PRODUCT-for-unsgigned.patch.gz Description: application/gzip
Re: [PATCH v2] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.
On Sat, Sep 7, 2024 at 7:08 PM Jeff Law wrote: > > > > On 9/6/24 5:32 AM, Jin Ma wrote: > > In the process of DF to SI, we generally use "unsigned_fix" rather than > > "truncate" for conversion. Although this has no effect in general, > > unexpected ICE often occurs when precise semantic analysis is required. > > > > gcc/ChangeLog: > > > > * config/riscv/riscv.md: Change "truncate" to "unsigned_fix" for > > the Zfa extension on rv32. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test. > This doesn't look correct. > > fmv.x.w just moves the bits from one place to another, no conversion. > > So I don't see how the original pattern was correct. Using truncate on > an FP mode source isn't defined. But I also don't think the updated > pattern is correct either. > jeff Having matching pattern for these Zfa moves seems pointless because the optimization that utilizes the instructions in riscv_split_doubleword_move() uses: gen_movdfsisi3_rv32(), gen_movsidf2_low_rv32() and gen_movsidf2_high_rv32(). In the XTheadFmv pattern, we use unspec, so the pattern won't match. I think this should be done for Zfa as well.
Re: [PATCH 00/10] __builtin_dynamic_object_size
On Tue, Nov 9, 2021 at 11:02 AM Siddhesh Poyarekar wrote: > > > This patchset implements the __builtin_dynamic_object_size builtin for > gcc. The primary motivation to have this builtin in gcc is to enable > _FORTIFY_SOURCE=3 support with gcc, thus allowing greater fortification > in use cases where the potential performance tradeoff is acceptable. > > Semantics: > -- > > __builtin_dynamic_object_size has the same signature as > __builtin_object_size; it accepts a pointer and type ranging from 0 to 3 > and it returns an object size estimate for the pointer based on an > analysis of which objects the pointer could point to. The actual > properties of the object size estimate are different: > > - In the best case __builtin_dynamic_object_size evaluates to an > expression that represents a precise size of the object being pointed > to. > > - In case a precise object size expression cannot be evaluated, > __builtin_dynamic_object_size attempts to evaluate an estimate size > expression based on the object size type. > > - In what situations the builtin returns an estimate vs a precise > expression is an implementation detail and may change in future. > Users must always assume, as in the case of __builtin_object_size, that > the returned value is the maximum or minimum based on the object size > type they have provided. > > - In the worst case of failure, __builtin_dynamic_object_size returns a > constant (size_t)-1 or (size_t)0. > > Implementation: > --- > > - The __builtin_dynamic_object_size support is implemented in > tree-object-size. In most cases the first pass (early_objsz) is able > to evaluate object size expressions. The second phase mainly ends up > simplifying the __builtin_dynamic_object_size to > __builtin_object_size. > > - The patchset begins with structural modification of the > tree-object-size pass, followed by enhancement to return size > expressions. I have split the implementation into one feature per > patch (calls, function parameters, PHI, etc.) to hopefully ease > review. > > Performance: > > > Expressions generated by this pass in theory could be arbitrarily > complex. I have not made an attempt to limit nesting of objects since > it seemed too early to do that. In practice based on the few > applications I built, most of the complexity of the expressions got > folded away. Even so, the performance overhead is likely to be > non-zero. If we find performance degradation to be significant, we > could later add nesting limits to bail out if a size expression gets too > complex. > > I have also not implemented simplification of __*_chk to their normal > variants if we can determine at compile time that it is safe, which > still depends on the object size to be constant. I hope to do this as a > minor performance improvement in stage 3. > > Build time performance doesn't seem to be affected much based on an > unscientific check to time > `make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`. It only increases by > about a couple of seconds when the dynamic tests are added and remains > more or less in the same ballpark otherwise. > > Testing: > > > I have added tests for dynamic object sizes as well as wrappers for all > __builtin_object_size tests to provide wide coverage. With that in > place I have run full bootstrap builds on Fedora rawhide by backporting the > patches to the gcc11 branch and x86_64 and i686 have no failures in any > of the builtin-*object-size* tests and no new failures. > > I have also built bash, cmake, zfs-fuse and systemtap with > _FORTIFY_SOURCE=3 (with a hacked up glibc to make sure it works) and saw > no issues in any of those builds. I did some rudimentary analysis of > the generated binaries to see if there was any difference in coverage > and found that there was. In terms of pure numbers, there were far more > _chk variants of calls than the regular ones due to _FORTIFY_SOURCE > (from about 4% to 70% in bash), but that could well be due to the _chk > variants not being folded into regular variants when safe. However, on > manual inspection of some of these sites, it was clear that coverage was > increasing significantly where __builtin_object_size was previously > bailing out. > > Specifically for bash, the coverage went from 30.81% to 86.87% (it was > 84.5% with the v1 patch). I actually hope to reduce this a bit with > folding improvements for __builtin___memcpy_chk, etc. > > A bootstrap build is in progress on x86_64. > > Limitations/Future work: > > > - The most immediate work is to fold _chk variants of builtins into > regular ones when it can be proven at compile time that the object > size will alwasy be less than the length of the write. I am working > on it right now. > > - I need to enable _FORTIFY_SOURCE=3 for gcc in glibc; currently it is > llvm-only. I've started working on these patches too on the side. > > - Instead of ba
Re: [PATCH 00/10] __builtin_dynamic_object_size
On 2024-09-08 18:01, Andrew Pinski wrote: When I was implementing improvements into phiopt I ran into case where objsz would fail now because we get: tmp = PHI ptr = ptr + tmp where before the pointer plus was inside each branch instead. So my question is there any progress on implementing objsz with ranger or has that work been put off? I filed https://gcc.gnu.org/PR116556 for this. Do you have any start Sorry I noticed it but couldn't get time to work on it. __bos should trivially handle this I think, I'm surprised that it doesn't. As far as using ranger in __bos is concerned, I believe there was strong opposition to the idea and instead to continue evaluating sizes from first principles like we currently do in the tree-object-size pass. I think the concern was that ranger can be more aggressive in evaluation and assume no undefined behaviour whereas __bos tries to detect undefined behaviour whenever it can and is more conservative about computations. of patches for this so it maybe it could be taken to finish? If not then I am going to try to implement it since it blocks my work on phiopt. I'll be happy to review any changes you make to tree-object-size to fix this. Thanks, Sid
RE: [PATCH v1] Vect: Support form 1 of vector signed integer .SAT_ADD
Kindly ping. Pan -Original Message- From: Li, Pan2 Sent: Friday, August 30, 2024 6:16 PM To: gcc-patches@gcc.gnu.org Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 Subject: [PATCH v1] Vect: Support form 1 of vector signed integer .SAT_ADD From: Pan Li This patch would like to support the vector signed ssadd pattern for the RISC-V backend. Aka Form 1: #define DEF_VEC_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_add_##T##_fmt_1 (T *out, T *x, T *y, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ { \ T sum = (UT)x[i] + (UT)y[i]; \ out[i] = (x[i] ^ y[i]) < 0 \ ? sum \ : (sum ^ x[i]) >= 0\ ? sum\ : x[i] < 0 ? MIN : MAX; \ } \ } DEF_VEC_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX) If the backend implemented the vector mode of ssadd, we will see IR diff similar as below: Before this patch: 108 │ _114 = .SELECT_VL (ivtmp_112, POLY_INT_CST [2, 2]); 109 │ ivtmp_77 = _114 * 8; 110 │ vect__4.9_80 = .MASK_LEN_LOAD (vectp_x.7_78, 64B, { -1, ... }, _114, 0); 111 │ vect__5.10_81 = VIEW_CONVERT_EXPR(vect__4.9_80); 112 │ vect__7.13_85 = .MASK_LEN_LOAD (vectp_y.11_83, 64B, { -1, ... }, _114, 0); 113 │ vect__8.14_86 = VIEW_CONVERT_EXPR(vect__7.13_85); 114 │ vect__9.15_87 = vect__5.10_81 + vect__8.14_86; 115 │ vect_sum_20.16_88 = VIEW_CONVERT_EXPR(vect__9.15_87); 116 │ vect__10.17_89 = vect__4.9_80 ^ vect__7.13_85; 117 │ vect__11.18_90 = vect__4.9_80 ^ vect_sum_20.16_88; 118 │ mask__46.19_92 = vect__10.17_89 >= { 0, ... }; 119 │ _36 = vect__4.9_80 >> 63; 120 │ mask__44.26_104 = vect__11.18_90 < { 0, ... }; 121 │ mask__43.27_105 = mask__46.19_92 & mask__44.26_104; 122 │ _115 = .COND_XOR (mask__43.27_105, _36, { 9223372036854775807, ... }, vect_sum_20.16_88); 123 │ .MASK_LEN_STORE (vectp_out.29_108, 64B, { -1, ... }, _114, 0, _115); 124 │ vectp_x.7_79 = vectp_x.7_78 + ivtmp_77; 125 │ vectp_y.11_84 = vectp_y.11_83 + ivtmp_77; 126 │ vectp_out.29_109 = vectp_out.29_108 + ivtmp_77; 127 │ ivtmp_113 = ivtmp_112 - _114; After this patch: 94 │ # vectp_x.7_82 = PHI 95 │ # vectp_y.10_86 = PHI 96 │ # vectp_out.14_91 = PHI 97 │ # ivtmp_95 = PHI 98 │ _97 = .SELECT_VL (ivtmp_95, POLY_INT_CST [2, 2]); 99 │ ivtmp_81 = _97 * 8; 100 │ vect__4.9_84 = .MASK_LEN_LOAD (vectp_x.7_82, 64B, { -1, ... }, _97, 0); 101 │ vect__7.12_88 = .MASK_LEN_LOAD (vectp_y.10_86, 64B, { -1, ... }, _97, 0); 102 │ vect_patt_40.13_89 = .SAT_ADD (vect__4.9_84, vect__7.12_88); 103 │ .MASK_LEN_STORE (vectp_out.14_91, 64B, { -1, ... }, _97, 0, vect_patt_40.13_89); 104 │ vectp_x.7_83 = vectp_x.7_82 + ivtmp_81; 105 │ vectp_y.10_87 = vectp_y.10_86 + ivtmp_81; 106 │ vectp_out.14_92 = vectp_out.14_91 + ivtmp_81; 107 │ ivtmp_96 = ivtmp_95 - _97; The below test suites are passed for this patch: 1. The rv64gcv fully regression tests. 2. The x86 bootstrap tests. 3. The x86 fully regression tests. gcc/ChangeLog: * match.pd: Add case 2 for the signed .SAT_ADD consumed by vect pattern. * tree-vect-patterns.cc (gimple_signed_integer_sat_add): Add new matching func decl for signed .SAT_ADD. (vect_recog_sat_add_pattern): Add signed .SAT_ADD pattern match. Signed-off-by: Pan Li --- gcc/match.pd | 17 + gcc/tree-vect-patterns.cc | 5 - 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/gcc/match.pd b/gcc/match.pd index be211535a49..578c9dd5b77 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3207,6 +3207,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type) && types_match (type, @0, @1 +/* Signed saturation add, case 2: + T sum = (T)((UT)X + (UT)Y) + SAT_S_ADD = (X ^ Y) < 0 && (X ^ sum) >= 0 ? (-(T)(X < 0) ^ MAX) : sum; + + The T and UT are type pair like T=int8_t, UT=uint8_t. */ +(match (signed_integer_sat_add @0 @1) + (cond^ (bit_and:c (lt (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0) + (nop_convert @1 + integer_zerop) + (ge (bit_xor:c @0 @1) integer_zerop)) + (bit_xor:c (nop_convert (negate (nop_convert (convert + (l
[PATCH] Don't force_reg operands[3] when it's not const0_rtx.
It fix the regression by a51f2fc0d80869ab079a93cc3858f24a1fd28237 is the first bad commit commit a51f2fc0d80869ab079a93cc3858f24a1fd28237 Author: liuhongt Date: Wed Sep 4 15:39:17 2024 +0800 Handle const0_operand for *avx2_pcmp3_1. caused FAIL: gcc.target/i386/pr59539-1.c scan-assembler-times vmovdqu|vmovups 1 To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/pr59539-1.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/pr59539-1.c --target_board='unix{-m64\ -march=cascadelake}'" Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/sse.md (*avx2_pcmp3_1): Don't force_reg operands[3] when it's not const0_rtx. --- gcc/config/i386/sse.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 1946d3513be..1ae61182d0c 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -17929,7 +17929,8 @@ (define_insn_and_split "*avx2_pcmp3_1" if (INTVAL (operands[5]) == 1) std::swap (operands[3], operands[4]); - operands[3] = force_reg (mode, operands[3]); + if (operands[3] == CONST0_RTX (mode)) +operands[3] = force_reg (mode, operands[3]); if (operands[4] == CONST0_RTX (mode)) operands[4] = force_reg (mode, operands[4]); -- 2.31.1
Re: [PATCH v2] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.
> Having matching pattern for these Zfa moves seems pointless because > the optimization that utilizes the instructions in > riscv_split_doubleword_move() uses: > gen_movdfsisi3_rv32(), gen_movsidf2_low_rv32() and gen_movsidf2_high_rv32(). > In the XTheadFmv pattern, we use unspec, so the pattern won't match. > I think this should be done for Zfa as well. Yes, that's a very good suggestion. I will try. Thanks. Jin
Re: [PATCH] s390: Fix strict_low_part generation
Ping On Fri, Aug 16, 2024 at 09:14:02AM +0200, Stefan Schulze Frielinghaus wrote: > In s390_expand_insv(), if generating code for ICM et al. src is a MEM > and gen_lowpart might force src into a register such that we end up with > patterns which do not match anymore. Use adjust_address() instead in > order to preserve a MEM. > > Furthermore, it is not straight forward to enforce a subreg. For > example, in case of a paradoxical subreg, gen_lowpart() may return a > register. In order to compensate this, s390_gen_lowpart_subreg() emits > a reference to a pseudo which does not coincide with its definition > which is wrong. Additionally, if dest is a paradoxical subreg, then do > not try to emit a strict_low_part since it could mean that dest was not > initialized even though this might be fixed up later by init-regs. > > Splitter for insn *get_tp_64, *zero_extendhisi2_31, > *zero_extendqisi2_31, *zero_extendqihi2_31 are applied after reload. > Thus, operands[0] is a hard register and gen_lowpart (m, operands[0]) > just returns the hard register for mode m which is fine to use as an > argument for strict_low_part, i.e., we do not need to enforce subregs > here since after reload subregs are supposed to be eliminated anyway. > > This fixes gcc.dg/torture/pr111821.c. > > gcc/ChangeLog: > > * config/s390/s390-protos.h (s390_gen_lowpart_subreg): Remove. > * config/s390/s390.cc (s390_gen_lowpart_subreg): Remove. > (s390_expand_insv): Use adjust_address() and emit a > strict_low_part only in case of a natural subreg. > * config/s390/s390.md: Use gen_lowpart() instead of > s390_gen_lowpart_subreg(). > --- > Bootstrapped and regtested on s390. Ok for mainline,gcc12,gcc13,gcc14? > > gcc/config/s390/s390-protos.h | 1 - > gcc/config/s390/s390.cc | 47 +++ > gcc/config/s390/s390.md | 13 +- > 3 files changed, 20 insertions(+), 41 deletions(-) > > diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h > index b4646ccb606..e7ac59d17da 100644 > --- a/gcc/config/s390/s390-protos.h > +++ b/gcc/config/s390/s390-protos.h > @@ -50,7 +50,6 @@ extern void s390_set_has_landing_pad_p (bool); > extern bool s390_hard_regno_rename_ok (unsigned int, unsigned int); > extern int s390_class_max_nregs (enum reg_class, machine_mode); > extern bool s390_return_addr_from_memory(void); > -extern rtx s390_gen_lowpart_subreg (machine_mode, rtx); > extern bool s390_fma_allowed_p (machine_mode); > #if S390_USE_TARGET_ATTRIBUTE > extern tree s390_valid_target_attribute_tree (tree args, > diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc > index 7aea776da2f..7cdcebfc08b 100644 > --- a/gcc/config/s390/s390.cc > +++ b/gcc/config/s390/s390.cc > @@ -516,31 +516,6 @@ s390_return_addr_from_memory () >return cfun_gpr_save_slot(RETURN_REGNUM) == SAVE_SLOT_STACK; > } > > -/* Generate a SUBREG for the MODE lowpart of EXPR. > - > - In contrast to gen_lowpart it will always return a SUBREG > - expression. This is useful to generate STRICT_LOW_PART > - expressions. */ > -rtx > -s390_gen_lowpart_subreg (machine_mode mode, rtx expr) > -{ > - rtx lowpart = gen_lowpart (mode, expr); > - > - /* There might be no SUBREG in case it could be applied to the hard > - REG rtx or it could be folded with a paradoxical subreg. Bring > - it back. */ > - if (!SUBREG_P (lowpart)) > -{ > - machine_mode reg_mode = TARGET_ZARCH ? DImode : SImode; > - gcc_assert (REG_P (lowpart)); > - lowpart = gen_lowpart_SUBREG (mode, > - gen_rtx_REG (reg_mode, > - REGNO (lowpart))); > -} > - > - return lowpart; > -} > - > /* Return nonzero if it's OK to use fused multiply-add for MODE. */ > bool > s390_fma_allowed_p (machine_mode mode) > @@ -7112,15 +7087,21 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src) >/* Emit a strict_low_part pattern if possible. */ >if (smode_bsize == bitsize && bitpos == mode_bsize - smode_bsize) > { > - rtx low_dest = s390_gen_lowpart_subreg (smode, dest); > - rtx low_src = gen_lowpart (smode, src); > - > - switch (smode) > + rtx low_dest = gen_lowpart (smode, dest); > + if (SUBREG_P (low_dest) && !paradoxical_subreg_p (low_dest)) > { > - case E_QImode: emit_insn (gen_movstrictqi (low_dest, low_src)); > return true; > - case E_HImode: emit_insn (gen_movstricthi (low_dest, low_src)); > return true; > - case E_SImode: emit_insn (gen_movstrictsi (low_dest, low_src)); > return true; > - default: break; > + poly_int64 offset = GET_MODE_SIZE (mode) - GET_MODE_SIZE (smode); > + rtx low_src = adjust_address (src, smode, offset); > + switch (smode) > + { > + case E_QImode: emit_insn (gen_movstrictqi (low_dest, low_src)); > +
Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]
Ping On Fri, Aug 16, 2024 at 09:41:55AM +0200, Stefan Schulze Frielinghaus wrote: > Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1 > survive register allocation. This in turn leads to wrong register > renaming. Keeping the current approach would mean we need two insns for > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. Something along the > lines > > (define_insn "*tf_to_fprx2_0" > [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0) > (unspec:DF [(match_operand:TF 1 "general_operand" "v")] >UNSPEC_TF_TO_FPRX2_0))] > "TARGET_VXE" > "#") > > (define_insn "*tf_to_fprx2_0" > [(set (match_operand:DF 0 "nonimmediate_operand" "=f") > (unspec:DF [(match_operand:TF 1 "general_operand" "v")] >UNSPEC_TF_TO_FPRX2_0))] > "TARGET_VXE" > "vpdi\t%v0,%v1,%v0,1 > [(set_attr "op_type" "VRR")]) > > and similar for *tf_to_fprx2_1. Note, pre register allocation operand 0 > has mode FPRX2 and afterwards DF once subregs have been eliminated. > > Since we always copy a whole vector register into a floating-point > register pair, another way to fix this is to merge *tf_to_fprx2_0 and > *tf_to_fprx2_1 into a single insn which means we don't have to use > subregs at all. The downside of this is that the assembler template > contains two instructions, now. The upside is that we don't have to > come up with some artificial insn before RA which might be more > readable/maintainable. That is implemented by this patch. > > In commit r11-4872-ge627cda5686592, the output operand specifier %V was > introduced which is used in tf_to_fprx2 only, now. I didn't come up > with its counterpart like %F for floating-point registers. Instead I > printed the register pair in the output function directly. This spares > us a new and "rare" format specifier for a single insn. I don't have a > strong opinion which option to choose, however, we should either add %F > in order to mimic the same behaviour as %V or getting rid of %V and > inline the logic in the output function. I lean towards the latter. > Any preferences? > --- > gcc/config/s390/s390.md| 2 + > gcc/config/s390/vector.md | 66 +++--- > gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 + > 3 files changed, 60 insertions(+), 34 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c > > diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md > index 3d5759d6252..31240899934 100644 > --- a/gcc/config/s390/s390.md > +++ b/gcc/config/s390/s390.md > @@ -241,6 +241,8 @@ > UNSPEC_VEC_VFMIN > UNSPEC_VEC_VFMAX > > + UNSPEC_TF_TO_FPRX2 > + > UNSPEC_NNPA_VCLFNHS_V8HI > UNSPEC_NNPA_VCLFNLS_V8HI > UNSPEC_NNPA_VCRNFS_V8HI > diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md > index a75b7cb5825..561182e0c2c 100644 > --- a/gcc/config/s390/vector.md > +++ b/gcc/config/s390/vector.md > @@ -907,36 +907,36 @@ >"vmrlg\t%0,%1,%2"; >[(set_attr "op_type" "VRR")]) > > - > -(define_insn "*tf_to_fprx2_0" > - [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0) > - (subreg:DF (match_operand:TF1 "general_operand" "v") 0))] > - "TARGET_VXE" > - ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1]; > - "vpdi\t%v0,%v1,%v0,1" > - [(set_attr "op_type" "VRR")]) > - > -(define_insn "*tf_to_fprx2_1" > - [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8) > - (subreg:DF (match_operand:TF1 "general_operand" "v") 8))] > +(define_insn "tf_to_fprx2" > + [(set (match_operand:FPRX2 0 "register_operand" "=f,f ,f") > + (unspec:FPRX2 [(match_operand:TF 1 "general_operand" "v,AR,AT")] > + UNSPEC_TF_TO_FPRX2))] >"TARGET_VXE" > - ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1]; > - "vpdi\t%V0,%v1,%V0,5" > - [(set_attr "op_type" "VRR")]) > - > -(define_insn_and_split "tf_to_fprx2" > - [(set (match_operand:FPRX20 "nonimmediate_operand" "=f,f") > - (subreg:FPRX2 (match_operand:TF 1 "general_operand" "v,AR") 0))] > - "TARGET_VXE" > - "#" > - "!(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))" > - [(set (match_dup 2) (match_dup 3)) > - (set (match_dup 4) (match_dup 5))] > { > - operands[2] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 0); > - operands[3] = simplify_gen_subreg (DFmode, operands[1], TFmode, 0); > - operands[4] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 8); > - operands[5] = simplify_gen_subreg (DFmode, operands[1], TFmode, 8); > + char buf[64]; > + switch (which_alternative) > +{ > +case 0: > + if (REGNO (operands[0]) == REGNO (operands[1])) > + return "vpdi\t%V0,%v1,%V0,5"; > + else > + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5"; > +case 1: > + { > + const char *reg_pair = reg_names[REGNO (operands[0]) + 1]; > + snprintf (buf, si
[PING^6][PATCH v2] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]
Ping * 6 Please review! On 21/03/24 6:21 pm, jeevitha wrote: > Hi All, > > The following patch has been bootstrapped and regtested on powerpc64le-linux. > > PTImode assists in generating even/odd register pairs on 128 bits. When the > user > specifies PTImode as an attribute, it breaks because there is no internal > type > to handle this mode. To address this, we have created a tree node with dummy > type > to handle PTImode. We are not documenting this dummy type since users are not > allowed to use this type externally. > > 2024-03-21 Jeevitha Palanisamy > > gcc/ > PR target/110411 > * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add > RS6000_BTI_INTPTI. > * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Add node > for PTImode type. > > gcc/testsuite/ > PR target/106895 > * gcc.target/powerpc/pr106895.c: New testcase. > > diff --git a/gcc/config/rs6000/rs6000-builtin.cc > b/gcc/config/rs6000/rs6000-builtin.cc > index 6698274031b..f553c72779e 100644 > --- a/gcc/config/rs6000/rs6000-builtin.cc > +++ b/gcc/config/rs6000/rs6000-builtin.cc > @@ -756,6 +756,15 @@ rs6000_init_builtins (void) >else > ieee128_float_type_node = NULL_TREE; > > + /* PTImode to get even/odd register pairs. */ > + intPTI_type_internal_node = make_node(INTEGER_TYPE); > + TYPE_PRECISION (intPTI_type_internal_node) = GET_MODE_BITSIZE (PTImode); > + layout_type (intPTI_type_internal_node); > + SET_TYPE_MODE (intPTI_type_internal_node, PTImode); > + t = build_qualified_type (intPTI_type_internal_node, TYPE_QUAL_CONST); > + lang_hooks.types.register_builtin_type (intPTI_type_internal_node, > + "__dummypti"); > + >/* Vector pair and vector quad support. */ >vector_pair_type_node = make_node (OPAQUE_TYPE); >SET_TYPE_MODE (vector_pair_type_node, OOmode); > diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h > index 68bc45d65ba..b6078077b20 100644 > --- a/gcc/config/rs6000/rs6000.h > +++ b/gcc/config/rs6000/rs6000.h > @@ -2302,6 +2302,7 @@ enum rs6000_builtin_type_index >RS6000_BTI_ptr_vector_quad, >RS6000_BTI_ptr_long_long, >RS6000_BTI_ptr_long_long_unsigned, > + RS6000_BTI_INTPTI, >RS6000_BTI_MAX > }; > > @@ -2346,6 +2347,7 @@ enum rs6000_builtin_type_index > #define uintDI_type_internal_node > (rs6000_builtin_types[RS6000_BTI_UINTDI]) > #define intTI_type_internal_node > (rs6000_builtin_types[RS6000_BTI_INTTI]) > #define uintTI_type_internal_node > (rs6000_builtin_types[RS6000_BTI_UINTTI]) > +#define intPTI_type_internal_node > (rs6000_builtin_types[RS6000_BTI_INTPTI]) > #define float_type_internal_node > (rs6000_builtin_types[RS6000_BTI_float]) > #define double_type_internal_node > (rs6000_builtin_types[RS6000_BTI_double]) > #define long_double_type_internal_node > (rs6000_builtin_types[RS6000_BTI_long_double]) > diff --git a/gcc/testsuite/gcc.target/powerpc/pr106895.c > b/gcc/testsuite/gcc.target/powerpc/pr106895.c > new file mode 100644 > index 000..56547b7fa9d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr106895.c > @@ -0,0 +1,15 @@ > +/* PR target/106895 */ > +/* { dg-require-effective-target int128 } */ > +/* { dg-options "-O2" } */ > + > +/* Verify the following generates even/odd register pairs. */ > + > +typedef __int128 pti __attribute__((mode(PTI))); > + > +void > +set128 (pti val, pti *mem) > +{ > +asm("stq %1,%0" : "=m"(*mem) : "r"(val)); > +} > + > +/* { dg-final { scan-assembler "stq \[123\]?\[02468\]" } } */ >
RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support
> -Original Message- > From: Thomas Schwinge > Sent: Friday, September 6, 2024 2:31 PM > To: Prathamesh Kulkarni ; Richard Biener > > Cc: Andrew Pinski ; gcc-patches@gcc.gnu.org; Jakub > Jelinek > Subject: RE: [nvptx] Pass -m32/-m64 to host_compiler if it has > multilib support > > External email: Use caution opening links or attachments > > > Hi! Hi Thomas, Thanks for the review! > > On 2024-08-16T15:36:29+, Prathamesh Kulkarni > wrote: > >> > Am 13.08.2024 um 17:48 schrieb Thomas Schwinge > >> : > >> > On 2024-08-12T07:50:07+, Prathamesh Kulkarni > >> wrote: > >> >>> From: Thomas Schwinge > >> >>> Sent: Friday, August 9, 2024 12:55 AM > >> > > >> >>> On 2024-08-08T06:46:25-0700, Andrew Pinski > >> wrote: > >> On Thu, Aug 8, 2024 at 6:11 AM Prathamesh Kulkarni > >> wrote: > >> > compiled with -fopenmp -foffload=nvptx-none now fails with: > >> > gcc: error: unrecognized command-line option '-m64' > >> > nvptx mkoffload: fatal error: ../install/bin/gcc returned 1 > >> > exit > >> >>> status compilation terminated. > >> >>> > >> >>> Heh. Yeah... > >> >>> > >> > As mentioned in RFC email, this happens because > >> > nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host > >> > compiler > >> >>> depending on whether offload_abi is OFFLOAD_ABI_LP64 or > >> >>> OFFLOAD_ABI_ILP32, and aarch64 backend doesn't recognize these > >> >>> options. > >> > > >> >>> So, my idea is: instead of the current strategy that the host > >> >>> 'TARGET_OFFLOAD_OPTIONS' synthesizes '-foffload-abi=lp64' etc., > >> >>> which the 'mkoffload's then interpret and re-synthesize '-m64' > etc. > >> >>> -- how about we instead directly tell the 'mkoffload's the > >> >>> relevant ABI options? That is, 'TARGET_OFFLOAD_OPTIONS' > instead > >> >>> synthesizes '-foffload-abi=-m64' > >> >>> etc., which the 'mkoffload's can then readily use. Could you > >> >>> please give that a try, and/or does anyone see any issues with > that approach? > >> >>> > >> >>> And use something like '-foffload-abi=disable' to replace the > current: > >> >>> > >> >>>/* PR libgomp/65099: Currently, we only support offloading > in 64-bit > >> >>> configurations. */ > >> >>>if (offload_abi == OFFLOAD_ABI_LP64) > >> >>> { > >> >>> > >> >>> (As discussed before, this should be done differently > altogether, > >> >>> but that's for another day.) > >> >> Sorry, I don't quite follow. Currently we enable offloading if > >> >> offload_abi == OFFLOAD_ABI_LP64, which is synthesized from > >> >> -foffload-abi=lp64. If we change -foffload-abi to instead > specify > >> >> host-specific ABI opts, I guess mkoffload will still need to > >> >> somehow figure out which ABI is used, so it can disable > offloading > >> >> for 32-bit ? I suppose we could adjust TARGET_OFFLOAD_OPTIONS > for > >> >> each host to > >> pass -foffload-abi=disable if TARGET_ILP32 is set and offload > target > >> is nvptx, but not sure if that'd be correct ? > >> > > >> > Basically, yes. My idea was that all 'TARGET_OFFLOAD_OPTIONS' > >> > implementations return either the correct host flags to be used > by > >> > the 'mkoffload's (the case that offloading is supported for the > >> > current host flags/ABI configuration), or otherwise return '- > foffload-abi=disable'. > > Oh..., you're right of course: we do need to continue to tell the > 'mkoffload's which kind of offload code to generate! My bad... > > >> >> I added another option -foffload-abi-host-opts to specify host > abi > >> >> opts, and leave -foffload-abi to specify if ABI is 32/64 bit > which > >> >> mkoffload can use to enable/disable offloading (as before). > >> > > >> > I'm not sure however, if this additional option is really > necessary? > > Well, my concern was if that'd change the behavior for TARGET_ILP32 > ? > > IIUC, currently for -foffload-abi=ilp32, mkoffload will create empty > C > > file for ptx_cfile_name (instead of munged ptx assembly since > > offloading will be disabled), and pass that to host compiler with - > m32 option (in compile_native). > > > > If we change -foffload-abi to specify ABI host opts, and pass > > -foffload-abi=disable for TARGET_ILP32 in TARGET_OFFLOAD_OPTIONS, > > mkoffload will no longer be able to pass 32-bit ABI opts to host > > compiler, which may result in linker error (arch mismatch?) if the > > host object files are 32-bit ABI and xnvptx-none.o is 64-bit > (assuming the host compiler is configured to generate 64-bit code-gen > by default) ? > > > > So, I thought to add another option -foffload-abi-host-opts to pass > > host-specific ABI opts, and keep -foffload-abi as-is to infer ABI > type for enabling/disabling offloading. > > Quite right, yes. > > >> -Original Message- > >> From: Richard Biener > >> Sent: Tuesday, August 13, 2024 10:06 PM > > >> Since we do not support 32 -> 64 bit offload > > We don't -- but it's generally possible. As Tobias recently educated > me, the OpenMP specification expl
Match: Fix ordered and nonequal: Fix 'gcc.dg/opt-ordered-and-nonequal-1.c' re 'LOGICAL_OP_NON_SHORT_CIRCUIT' [PR116635] (was: [PATCH] Match: Fix ordered and nonequal)
Hi! On 2024-09-04T13:43:45+0800, "Hu, Lin1" wrote: > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c > @@ -0,0 +1,49 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ > + > +int is_ordered_and_nonequal_sh_1 (float a, float b) > +{ > + return !__builtin_isunordered (a, b) && (a != b); > +} > + > +[...] > + > +/* { dg-final { scan-tree-dump-times "gimple_simplified to\[^\n\r]*<>" 9 > "forwprop1" } } */ OK to push "Match: Fix ordered and nonequal: Fix 'gcc.dg/opt-ordered-and-nonequal-1.c' re 'LOGICAL_OP_NON_SHORT_CIRCUIT' [PR116635]", see attached? Grüße Thomas >From 3e85cb373fb86db5fad86a452a69e713c7050f16 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Mon, 9 Sep 2024 08:39:10 +0200 Subject: [PATCH] Match: Fix ordered and nonequal: Fix 'gcc.dg/opt-ordered-and-nonequal-1.c' re 'LOGICAL_OP_NON_SHORT_CIRCUIT' [PR116635] Fix up to make 'gcc.dg/opt-ordered-and-nonequal-1.c' of commit 91421e21e8f0f05f440174b8de7a43a311700e08 "Match: Fix ordered and nonequal" work for default 'LOGICAL_OP_NON_SHORT_CIRCUIT == false' configurations. PR testsuite/116635 gcc/testsuite/ * gcc.dg/opt-ordered-and-nonequal-1.c: Fix re 'LOGICAL_OP_NON_SHORT_CIRCUIT'. --- gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c index 6d102c2bd0c..d61c3322214 100644 --- a/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c +++ b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c @@ -1,5 +1,7 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ +/* Make this work for default 'LOGICAL_OP_NON_SHORT_CIRCUIT == false' configurations: + { dg-additional-options "--param logical-op-non-short-circuit=1" } */ int is_ordered_and_nonequal_sh_1 (float a, float b) { -- 2.34.1
Re: [patch,reload] PR116326: Add #define IN_RELOAD1_CC
On Sat, Sep 7, 2024 at 8:22 PM H.J. Lu wrote: > > > On Sat, Sep 7, 2024, 11:07 AM Georg-Johann Lay wrote: >> >> Am 07.09.24 um 19:51 schrieb H.J. Lu: >> > On Sat, Sep 7, 2024 at 10:49 AM Georg-Johann Lay wrote: >> >> Am 07.09.24 um 18:35 schrieb Jeff Law: >> >>> On 9/6/24 3:40 AM, Georg-Johann Lay wrote: >> The reason for PR116326 is that LRA and reload require different >> ELIMINABLE_REGS for a multi-register frame pointer. As ELIMINABLE_REGS >> is used to initialize static const objects, it is not possible to make >> ELIMINABLE_REGS to depend on options or patch it in some target hook. >> >> It was also concluded that it is not desirable to adjust reload so that >> it behaves like LRA, but a hack like #define IN_RELOAD1_CC at the top >> of reload1.cc would be fine, see >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116326#c8 >> >> This is an according patch that defines IN_RELOAD1_CC and uses it in >> avr.h to define ELIMINABLE_REGS. >> >> This is only required for trunk. >> >> PR116326 occurred for some test case in avr-torture.exp, so I didn't >> duplicate the test case. >> >> As it appears, this patch also fixes: >> >> https://gcc.gnu.org/PR116324 >> https://gcc.gnu.org/PR116325 >> https://gcc.gnu.org/PR116550 >> >> Johann >> >> -- >> >> AVR: target/116326 - Adjust ELIMINABLE_REGS to reload resp. LRA. >> >> PR target/116326 >> gcc/ >> * reload1.cc (IN_RELOAD1_CC): Define prior to all includes. >> * config/avr/avr.h (ELIMINABLE_REGS): Depend on IN_RELOAD1_CC. >> >>> I'm going to have to NAK this. It's too hackish, even with reload going >> >>> away >> >>> >> >>> Jeff >> >> >> >> So what is your proposal? >> >> >> >> I'd agree with Richard that we don't want to change reload >> >> implementations such that they agree on ELIMINABLE_REGS. >> >> >> >> A different way would be to change reload1.cc such that it includes >> >> reload.h prior to tm.h, i.e. prior to backend.h. This would make the >> >> hack avr realm entirely, but it won't be trivially no-op. >> >> >> >> Johann >> > >> > Why not add RELOAD_ELIMINABLE_REGS and default it >> > to ELIMINABLE_REGS? >> >> I can't find RELOAD_ELIMINABLE_REGS anywhere. >> So are you proposing a new hook macro? > > > A new macro. Sounds good to me btw. - the other alternative is to avoid having -m{,no-}lra and default to LRA now (not sure if that's feasible). Richard. >> >> Johann >>