[PATCH] x86_64: Add insn patterns for V1TI mode logic operations.
On x86_64, V1TI mode holds a 128-bit integer value in a (vector) SSE register (where regular TI mode uses a pair of 64-bit general purpose scalar registers). This patch improves the implementation of AND, IOR, XOR and NOT on these values. The benefit is demonstrated by the following simple test program: typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16))); v1ti and(v1ti x, v1ti y) { return x & y; } v1ti ior(v1ti x, v1ti y) { return x | y; } v1ti xor(v1ti x, v1ti y) { return x ^ y; } v1ti not(v1ti x) { return ~x; } For which GCC currently generates the rather large: and:movdqa %xmm0, %xmm2 movq%xmm1, %rdx movq%xmm0, %rax andq%rdx, %rax movhlps %xmm2, %xmm3 movhlps %xmm1, %xmm4 movq%rax, %xmm0 movq%xmm4, %rdx movq%xmm3, %rax andq%rdx, %rax movq%rax, %xmm5 punpcklqdq %xmm5, %xmm0 ret ior:movdqa %xmm0, %xmm2 movq%xmm1, %rdx movq%xmm0, %rax orq %rdx, %rax movhlps %xmm2, %xmm3 movhlps %xmm1, %xmm4 movq%rax, %xmm0 movq%xmm4, %rdx movq%xmm3, %rax orq %rdx, %rax movq%rax, %xmm5 punpcklqdq %xmm5, %xmm0 ret xor:movdqa %xmm0, %xmm2 movq%xmm1, %rdx movq%xmm0, %rax xorq%rdx, %rax movhlps %xmm2, %xmm3 movhlps %xmm1, %xmm4 movq%rax, %xmm0 movq%xmm4, %rdx movq%xmm3, %rax xorq%rdx, %rax movq%rax, %xmm5 punpcklqdq %xmm5, %xmm0 ret not:movdqa %xmm0, %xmm1 movq%xmm0, %rax notq%rax movhlps %xmm1, %xmm2 movq%rax, %xmm0 movq%xmm2, %rax notq%rax movq%rax, %xmm3 punpcklqdq %xmm3, %xmm0 ret with this patch we now generate the much more efficient: and:pand%xmm1, %xmm0 ret ior:por %xmm1, %xmm0 ret xor:pxor%xmm1, %xmm0 ret not:pcmpeqd %xmm1, %xmm1 pxor%xmm1, %xmm0 ret For my first few attempts at this patch I tried adding V1TI to the existing VI and VI12_AVX_512F mode iterators, but these then have dependencies on other iterators (and attributes), and so on until everything ties itself into a knot, as V1TI mode isn't really a first-class vector mode on x86_64. Hence I ultimately opted to use simple stand-alone patterns (as used by the existing TF mode support). This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" and "make -k check" with no new failures. Ok for mainline? 2021-10-22 Roger Sayle gcc/ChangeLog * config/i386/sse.md (v1ti3): New define_insn to implement V1TImode AND, IOR and XOR on TARGET_SSE2 (and above). (one_cmplv1ti2): New define expand. gcc/testsuite/ChangeLog * gcc.target/i386/sse2-v1ti-logic.c: New test case. * gcc.target/i386/sse2-v1ti-logic-2.c: New test case. Thanks in advance, Roger -- diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index fbf056b..f37c5c0 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -16268,6 +16268,31 @@ ] (const_string "")))]) +(define_insn "v1ti3" + [(set (match_operand:V1TI 0 "register_operand" "=x,x,v") + (any_logic:V1TI + (match_operand:V1TI 1 "register_operand" "%0,x,v") + (match_operand:V1TI 2 "vector_operand" "xBm,xm,vm")))] + "TARGET_SSE2" + "@ + p\t{%2, %0|%0, %2} + vp\t{%2, %1, %0|%0, %1, %2} + vp\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "noavx,avx,avx") + (set_attr "prefix" "orig,vex,evex") + (set_attr "prefix_data16" "1,*,*") + (set_attr "type" "sselog") + (set_attr "mode" "TI")]) + +(define_expand "one_cmplv1ti2" + [(set (match_operand:V1TI 0 "register_operand") + (xor:V1TI (match_operand:V1TI 1 "register_operand") + (match_dup 2)))] + "TARGET_SSE2" +{ + operands[2] = force_reg (V1TImode, CONSTM1_RTX (V1TImode)); +}) + (define_mode_iterator AVX512ZEXTMASK [(DI "TARGET_AVX512BW") (SI "TARGET_AVX512BW") HI]) /* { dg-do compile { target int128 } } */ /* { dg-options "-O2 -msse2" } */ /* { dg-require-effective-target sse2 } */ typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16))); v1ti and(v1ti x, v1ti y) { return x & y; } v1ti ior(v1ti x, v1ti y) { return x | y; } v1ti xor(v1ti x, v1ti y) { return x ^ y; } v1ti not(v1ti x) { return ~x; } /* { dg-final { scan-assembler "pand" } } */ /* { dg-final { scan-assembler "por" } } */ /* { dg-final { scan-assembler-times "pxor" 2 } } */ /* { dg-do compile { target int128 } } */ /* { dg-options "-O2 -msse2" } */ /* { dg-require-effective-target sse2 } */ typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16))); v1ti x; v1ti y; v1ti z; void and2() { x &= y; } void and3() { x
José's pending bind(C) patches / status (was: Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid)
Hi José, hi all, especially since my patch which moved the descriptor conversion from libgfortran to gfortran is in, I wonder whether there are still patches to be applied and useful testcases; I assume there are more issues in Bugzilla, especially as I filled myself some (often related to polymorphism or assumed rank). While I (and also Sandra) try to resolve some bugs and look at testcases: it would be helpful if others – in particular José – could check whether: (a) PRs can be now closed, (b) testcases exist which still should be added, (c) patches exist which still are applicable (even if they need to be revised). (Partial/full list below.) I hope that we can really cleanup this backlog – and possibly fix also some of the remaining bugs before GCC 12 is released! And kudos to José for the bug reports, testcases and patches – and sorry for slow reviews. I hope we resolve the pending issues and be quicker in future. Tobias PS: Old and probably current but incomplete pending patch list: On 21.06.21 17:21, José Rui Faustino de Sousa wrote: On 21/06/21 12:37, Tobias Burnus wrote: Thus: Do you have a list of patches pending review? https://gcc.gnu.org/pipermail/fortran/2021-April/055924.html https://gcc.gnu.org/pipermail/fortran/2021-April/055933.html https://gcc.gnu.org/pipermail/fortran/2021-June/056168.html https://gcc.gnu.org/pipermail/fortran/2021-June/056167.html https://gcc.gnu.org/pipermail/fortran/2021-June/056163.html https://gcc.gnu.org/pipermail/fortran/2021-June/056162.html https://gcc.gnu.org/pipermail/fortran/2021-June/056155.html https://gcc.gnu.org/pipermail/fortran/2021-June/056154.html https://gcc.gnu.org/pipermail/fortran/2021-June/056152.html https://gcc.gnu.org/pipermail/fortran/2021-June/056159.html https://gcc.gnu.org/pipermail/fortran/2021-April/055982.html https://gcc.gnu.org/pipermail/fortran/2021-April/055949.html https://gcc.gnu.org/pipermail/fortran/2021-April/055946.html https://gcc.gnu.org/pipermail/fortran/2021-April/055934.html https://gcc.gnu.org/pipermail/fortran/2021-June/056169.html https://gcc.gnu.org/pipermail/fortran/2021-April/055921.html I am not 100% sure this is all of them but it should be most. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [match.pd] PR83750 - CSE erf/erfc pair
On Wed, 20 Oct 2021 at 18:21, Richard Biener wrote: > > On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote: > > > On Tue, 19 Oct 2021 at 16:55, Richard Biener wrote: > > > > > > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > On Tue, 19 Oct 2021 at 13:02, Richard Biener > > > > wrote: > > > > > > > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches > > > > > wrote: > > > > > > > > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener > > > > > > wrote: > > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Richard, > > > > > > > > > > > > As suggested in PR, I have attached WIP patch that adds > > > > > > > > > > > > two patterns > > > > > > > > > > > > to match.pd: > > > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and, > > > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p(). > > > > > > > > > > > > > > > > > > > > > > > > This works to remove call to erfc for the following > > > > > > > > > > > > test: > > > > > > > > > > > > double f(double x) > > > > > > > > > > > > { > > > > > > > > > > > > double g(double, double); > > > > > > > > > > > > > > > > > > > > > > > > double t1 = __builtin_erf (x); > > > > > > > > > > > > double t2 = __builtin_erfc (x); > > > > > > > > > > > > return g(t1, t2); > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > with .optimized dump shows: > > > > > > > > > > > > t1_2 = __builtin_erf (x_1(D)); > > > > > > > > > > > > t2_3 = 1.0e+0 - t1_2; > > > > > > > > > > > > > > > > > > > > > > > > However, for the following test: > > > > > > > > > > > > double f(double x) > > > > > > > > > > > > { > > > > > > > > > > > > double g(double, double); > > > > > > > > > > > > > > > > > > > > > > > > double t1 = __builtin_erfc (x); > > > > > > > > > > > > return t1; > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not > > > > > > > > > > > > transform 1 - > > > > > > > > > > > > erf(x) to erfc(x) again > > > > > > > > > > > > post canonicalization. > > > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) > > > > > > > > > > > > gets applied, > > > > > > > > > > > > but then it tries to > > > > > > > > > > > > resimplify erfc(x), which fails post canonicalization. > > > > > > > > > > > > So we end up > > > > > > > > > > > > with erfc(x) transformed to > > > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't > > > > > > > > > > > > ideal. > > > > > > > > > > > > Could you suggest how to proceed ? > > > > > > > > > > > > > > > > > > > > > > I applied your patch manually and it does the intended > > > > > > > > > > > simplifications so I wonder what I am missing? > > > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even > > > > > > > > > > when there's > > > > > > > > > > no erf(x) in the source ? > > > > > > > > > > > > > > > > > > I do think it's reasonable to expect erfc to be available > > > > > > > > > when erf > > > > > > > > > is and vice versa but note both are C99 specified functions > > > > > > > > > (either > > > > > > > > > requires -lm). > > > > > > > > OK, thanks. Would it be OK to commit the patch after > > > > > > > > bootstrap+test ? > > > > > > > > > > > > > > Yes, but I'm confused because you say the patch doesn't work for > > > > > > > you? > > > > > > The patch works for me to CSE erf/erfc pair. > > > > > > However when there's only erfc in the source, it canonicalizes > > > > > > erfc(x) > > > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to > > > > > > erfc(x) > > > > > > with -O3 -funsafe-math-optimizations. > > > > > > > > > > > > For, > > > > > > t1 = __builtin_erfc(x), > > > > > > > > > > > > .optimized dump shows: > > > > > > _2 = __builtin_erf (x_1(D)); > > > > > > t1_3 = 1.0e+0 - _2; > > > > > > > > > > > > and for, > > > > > > double t1 = x + __builtin_erfc(x); > > > > > > > > > > > > .optimized dump shows: > > > > > > _3 = __builtin_erf (x_2(D)); > > > > > > _7 = x_2(D) + 1.0e+0; > > > > > > t1_4 = _7 - _3; > > > > > > > > > > > > I assume in both cases, we want erfc in the code-gen instead ? > > > > > > I think the reason uncaonicalization fails is because the pattern 1 > > > > > > - > > > > > > erf(x) to erfc(x) > > > > > > gets applied, but then it fails in resimplifying erfc(x), and we end > > > > > > up with 1 - erf(x) in code-gen. > > > > > > > > > > > > From gimple-match.c, it hits the simplification: > > > > > > >
Re: [PATCH] Handle jobserver file descriptors in btest.
On 10/21/21 20:15, Ian Lance Taylor wrote: On Thu, Oct 21, 2021 at 12:48 AM Martin Liška wrote: The patch is about sensitive handling of file descriptors opened by make's jobserver. Thanks. I think a better approach would be, at the start of main, fstat the descriptors up to 10 and record the ones for which fstat succeeds. Then at the end of main only check the descriptors for which fstat failed earlier. Sure, makes sense. I can work on that at some point if you don't want to tackle it. I've just done that in the attached patch. Is it fine? Thanks, Martin Ian From ad52a33e10f76119867dbf0b6d5875378aa399ab Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Fri, 22 Oct 2021 10:12:56 +0200 Subject: [PATCH] Handle jobserver file descriptors in btest. PR testsuite/102742 libbacktrace/ChangeLog: * btest.c (MIN_DESCRIPTOR): New. (MAX_DESCRIPTOR): Likewise. (check_available_files): Likewise. (check_open_files): Check only file descriptors that were not available at the entry. (main): Call check_available_files. --- libbacktrace/btest.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/libbacktrace/btest.c b/libbacktrace/btest.c index 9f9c03babf3..7ef6d320497 100644 --- a/libbacktrace/btest.c +++ b/libbacktrace/btest.c @@ -38,6 +38,7 @@ POSSIBILITY OF SUCH DAMAGE. */ #include #include #include +#include #include "filenames.h" @@ -458,16 +459,29 @@ test5 (void) return failures; } +#define MIN_DESCRIPTOR 3 +#define MAX_DESCRIPTOR 10 + +static int fstat_status[MAX_DESCRIPTOR]; + +/* Check files that are available. */ + +static void +check_available_files (void) +{ + struct stat s; + for (unsigned i = MIN_DESCRIPTOR; i < MAX_DESCRIPTOR; i++) +fstat_status[i] = fstat (i, &s); +} + /* Check that are no files left open. */ static void check_open_files (void) { - int i; - - for (i = 3; i < 10; i++) + for (unsigned i = MIN_DESCRIPTOR; i < MAX_DESCRIPTOR; i++) { - if (close (i) == 0) + if (fstat_status[i] != 0 && close (i) == 0) { fprintf (stderr, "ERROR: descriptor %d still open after tests complete\n", @@ -482,6 +496,8 @@ check_open_files (void) int main (int argc ATTRIBUTE_UNUSED, char **argv) { + check_available_files (); + state = backtrace_create_state (argv[0], BACKTRACE_SUPPORTS_THREADS, error_callback_create, NULL); -- 2.33.1
Re: José's pending bind(C) patches / status (was: Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid)
Hi Tobias, My disappearance is partly responsible for the backlog. I told José that I would start working on it some months since but just have not had the time. I can do some of the reviews but still do not have much time to spare. Perhaps you could divide them up between us. Andrew Benson has been working on some standards issues associated with a patch of mine that sorts out finalization for intrinsic assignment - PR64290. The main issue was that of finalization of finalizable types/classes that themselves have finalizable array components. I believe that the patch has it right, following a comparison between the (differing!) behaviour of other brands. We have also found a case where gfortran, with the patch applied, that still does not finalize when it should. I will work up a fix for this and will coordinate with Andrew to produce testcases as necessary, well before 15th November. Best regards Paul On Fri, 22 Oct 2021 at 08:42, Tobias Burnus wrote: > Hi José, hi all, > > especially since my patch which moved the descriptor conversion from > libgfortran to gfortran is in, I wonder whether there are still patches > to be applied and useful testcases; I assume there are more issues in > Bugzilla, especially as I filled myself some (often related to > polymorphism or assumed rank). While I (and also Sandra) try to resolve > some bugs and look at testcases: > > it would be helpful if others – in particular José – could check > whether: (a) PRs can be now closed, (b) testcases exist which still > should be added, (c) patches exist which still are applicable (even if > they need to be revised). (Partial/full list below.) > > I hope that we can really cleanup this backlog – and possibly fix also > some of the remaining bugs before GCC 12 is released! > > And kudos to José for the bug reports, testcases and patches – and sorry > for slow reviews. I hope we resolve the pending issues and be quicker in > future. > > Tobias > > PS: Old and probably current but incomplete pending patch list: > > On 21.06.21 17:21, José Rui Faustino de Sousa wrote: > > On 21/06/21 12:37, Tobias Burnus wrote: > >> Thus: Do you have a list of patches pending review? > > > > https://gcc.gnu.org/pipermail/fortran/2021-April/055924.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-April/055933.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-June/056168.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-June/056167.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-June/056163.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-June/056162.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-June/056155.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-June/056154.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-June/056152.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-June/056159.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-April/055982.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-April/055949.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-April/055946.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-April/055934.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-June/056169.html > > > > https://gcc.gnu.org/pipermail/fortran/2021-April/055921.html > > > > I am not 100% sure this is all of them but it should be most. > - > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, > 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: > Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; > Registergericht München, HRB 106955 > -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein
Re: [aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr
On Wed, 20 Oct 2021 at 15:05, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Tue, 19 Oct 2021 at 19:58, Richard Sandiford > > wrote: > >> > >> Prathamesh Kulkarni writes: > >> > Hi, > >> > The attached patch emits a more verbose diagnostic for target attribute > >> > that > >> > is an architecture extension needing a leading '+'. > >> > > >> > For the following test, > >> > void calculate(void) __attribute__ ((__target__ ("sve"))); > >> > > >> > With patch, the compiler now emits: > >> > 102376.c:1:1: error: arch extension ‘sve’ should be prepended with ‘+’ > >> > 1 | void calculate(void) __attribute__ ((__target__ ("sve"))); > >> > | ^~~~ > >> > > >> > instead of: > >> > 102376.c:1:1: error: pragma or attribute ‘target("sve")’ is not valid > >> > 1 | void calculate(void) __attribute__ ((__target__ ("sve"))); > >> > | ^~~~ > >> > >> Nice :-) > >> > >> > (This isn't specific to sve though). > >> > OK to commit after bootstrap+test ? > >> > > >> > Thanks, > >> > Prathamesh > >> > > >> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > >> > index a9a1800af53..975f7faf968 100644 > >> > --- a/gcc/config/aarch64/aarch64.c > >> > +++ b/gcc/config/aarch64/aarch64.c > >> > @@ -17821,7 +17821,16 @@ aarch64_process_target_attr (tree args) > >> >num_attrs++; > >> >if (!aarch64_process_one_target_attr (token)) > >> > { > >> > - error ("pragma or attribute % is not valid", > >> > token); > >> > + /* Check if token is possibly an arch extension without > >> > + leading '+'. */ > >> > + char *str = (char *) xmalloc (strlen (token) + 2); > >> > + str[0] = '+'; > >> > + strcpy(str + 1, token); > >> > >> I think std::string would be better here, e.g.: > >> > >> auto with_plus = std::string ("+") + token; > >> > >> > + if (aarch64_handle_attr_isa_flags (str)) > >> > + error("arch extension %<%s%> should be prepended with %<+%>", > >> > token); > >> > >> Nit: should be a space before the “(”. > >> > >> In principle, a fixit hint would have been nice here, but I don't think > >> we have enough information to provide one. (Just saying for the record.) > > Thanks for the suggestions. > > Does the attached patch look OK ? > > Looks good apart from a couple of formatting nits. > > > > Thanks, > > Prathamesh > >> > >> Thanks, > >> Richard > >> > >> > + else > >> > + error ("pragma or attribute % is not valid", > >> > token); > >> > + free (str); > >> > return false; > >> > } > >> > > > > > [aarch64] PR102376 - Emit better diagnostics for arch extension in target > > attribute. > > > > gcc/ChangeLog: > > PR target/102376 > > * config/aarch64/aarch64.c (aarch64_handle_attr_isa_flags): Change > > str's > > type to const char *. > > (aarch64_process_target_attr): Check if token is possibly an arch > > extension > > without leading '+' and emit diagnostic accordingly. > > > > gcc/testsuite/ChangeLog: > > PR target/102376 > > * gcc.target/aarch64/pr102376.c: New test. > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > index a9a1800af53..b72079bc466 100644 > > --- a/gcc/config/aarch64/aarch64.c > > +++ b/gcc/config/aarch64/aarch64.c > > @@ -17548,7 +17548,7 @@ aarch64_handle_attr_tune (const char *str) > > modified. */ > > > > static bool > > -aarch64_handle_attr_isa_flags (char *str) > > +aarch64_handle_attr_isa_flags (const char *str) > > { > >enum aarch64_parse_opt_result parse_res; > >uint64_t isa_flags = aarch64_isa_flags; > > @@ -17821,7 +17821,13 @@ aarch64_process_target_attr (tree args) > >num_attrs++; > >if (!aarch64_process_one_target_attr (token)) > > { > > - error ("pragma or attribute % is not valid", > > token); > > + /* Check if token is possibly an arch extension without > > + leading '+'. */ > > + auto with_plus = std::string("+") + token; > > Should be a space before “(”. > > > + if (aarch64_handle_attr_isa_flags (with_plus.c_str ())) > > + error ("arch extension %<%s%> should be prepended with %<+%>", > > token); > > Long line, should be: > > error ("arch extension %<%s%> should be prepended with %<+%>", >token); > > OK with those changes, thanks. Thanks, the patch regressed some target attr tests because it emitted diagnostics twice from aarch64_handle_attr_isa_flags. So for eg, spellcheck_1.c: __attribute__((target ("arch=armv8-a-typo"))) void foo () {} results in: spellcheck_1.c:5:1: error: invalid name ("armv8-a-typo") in ‘target("arch=")’ pragma or attribute 5 | { | ^ spellcheck_1.c:5:1: note: valid arguments are: armv8-a armv8.1-a armv8.2-a armv8.3-a armv8.4-a armv8.5-a armv8.6-a armv8.7-a armv8-r armv9-a spellcheck_1.c:5:1: error: invalid feature modifier arch=armv8-a-typo of value ("+arch=armv8-a-typo") in ‘target()’ pragma or attribute
Re: [match.pd] PR83750 - CSE erf/erfc pair
On Fri, 22 Oct 2021, Prathamesh Kulkarni wrote: > On Wed, 20 Oct 2021 at 18:21, Richard Biener wrote: > > > > On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote: > > > > > On Tue, 19 Oct 2021 at 16:55, Richard Biener wrote: > > > > > > > > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > On Tue, 19 Oct 2021 at 13:02, Richard Biener > > > > > wrote: > > > > > > > > > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches > > > > > > wrote: > > > > > > > > > > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener > > > > > > > wrote: > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi Richard, > > > > > > > > > > > > > As suggested in PR, I have attached WIP patch that > > > > > > > > > > > > > adds two patterns > > > > > > > > > > > > > to match.pd: > > > > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and, > > > > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p(). > > > > > > > > > > > > > > > > > > > > > > > > > > This works to remove call to erfc for the following > > > > > > > > > > > > > test: > > > > > > > > > > > > > double f(double x) > > > > > > > > > > > > > { > > > > > > > > > > > > > double g(double, double); > > > > > > > > > > > > > > > > > > > > > > > > > > double t1 = __builtin_erf (x); > > > > > > > > > > > > > double t2 = __builtin_erfc (x); > > > > > > > > > > > > > return g(t1, t2); > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > with .optimized dump shows: > > > > > > > > > > > > > t1_2 = __builtin_erf (x_1(D)); > > > > > > > > > > > > > t2_3 = 1.0e+0 - t1_2; > > > > > > > > > > > > > > > > > > > > > > > > > > However, for the following test: > > > > > > > > > > > > > double f(double x) > > > > > > > > > > > > > { > > > > > > > > > > > > > double g(double, double); > > > > > > > > > > > > > > > > > > > > > > > > > > double t1 = __builtin_erfc (x); > > > > > > > > > > > > > return t1; > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not > > > > > > > > > > > > > transform 1 - > > > > > > > > > > > > > erf(x) to erfc(x) again > > > > > > > > > > > > > post canonicalization. > > > > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) > > > > > > > > > > > > > gets applied, > > > > > > > > > > > > > but then it tries to > > > > > > > > > > > > > resimplify erfc(x), which fails post > > > > > > > > > > > > > canonicalization. So we end up > > > > > > > > > > > > > with erfc(x) transformed to > > > > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't > > > > > > > > > > > > > ideal. > > > > > > > > > > > > > Could you suggest how to proceed ? > > > > > > > > > > > > > > > > > > > > > > > > I applied your patch manually and it does the intended > > > > > > > > > > > > simplifications so I wonder what I am missing? > > > > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even > > > > > > > > > > > when there's > > > > > > > > > > > no erf(x) in the source ? > > > > > > > > > > > > > > > > > > > > I do think it's reasonable to expect erfc to be available > > > > > > > > > > when erf > > > > > > > > > > is and vice versa but note both are C99 specified functions > > > > > > > > > > (either > > > > > > > > > > requires -lm). > > > > > > > > > OK, thanks. Would it be OK to commit the patch after > > > > > > > > > bootstrap+test ? > > > > > > > > > > > > > > > > Yes, but I'm confused because you say the patch doesn't work > > > > > > > > for you? > > > > > > > The patch works for me to CSE erf/erfc pair. > > > > > > > However when there's only erfc in the source, it canonicalizes > > > > > > > erfc(x) > > > > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to > > > > > > > erfc(x) > > > > > > > with -O3 -funsafe-math-optimizations. > > > > > > > > > > > > > > For, > > > > > > > t1 = __builtin_erfc(x), > > > > > > > > > > > > > > .optimized dump shows: > > > > > > > _2 = __builtin_erf (x_1(D)); > > > > > > > t1_3 = 1.0e+0 - _2; > > > > > > > > > > > > > > and for, > > > > > > > double t1 = x + __builtin_erfc(x); > > > > > > > > > > > > > > .optimized dump shows: > > > > > > > _3 = __builtin_erf (x_2(D)); > > > > > > > _7 = x_2(D) + 1.0e+0; > > > > > > > t1_4 = _7 - _3; > > > > > > > > > > > > > > I assume in both cases, we want erfc in the code-gen instead ? > > > > > > > I think the reason uncaonicalization fails is be
[PATCH] bootstrap/102681 - properly CSE PHIs with default def args
The PR shows that we fail to CSE PHIs containing (different) default definitions due to the fact on how we now handle on-demand build of VN_INFO. The following fixes this in the same way the PHI visitation code does. On gcc.dg/ubsan/pr81981.c this causes one expected warning to be elided since the uninit pass sees the change [local count: 1073741824]: # u$0_2 = PHI - # cstore_11 = PHI v = u$0_2; - return cstore_11; + return u$0_2; and thus only one of the conditionally uninitialized uses (the other became dead). I have XFAILed the missing diagnostic, I don't see a way to preserve that. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-10-22 Richard Biener PR bootstrap/102681 * tree-ssa-sccvn.c (vn_phi_insert): For undefined SSA args record VN_TOP. (vn_phi_lookup): Likewise. * gcc.dg/tree-ssa/ssa-fre-97.c: New testcase. * gcc.dg/ubsan/pr81981.c: XFAIL one case. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c | 19 +++ gcc/testsuite/gcc.dg/ubsan/pr81981.c | 2 +- gcc/tree-ssa-sccvn.c | 14 -- 3 files changed, 32 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c new file mode 100644 index 000..2f09c8baeb8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* ethread threading does not yet catch this but it might at some point. */ +/* { dg-options "-O -fdump-tree-fre1-details -fno-thread-jumps" } */ + +int foo (int b, int x) +{ + int i, j; + if (b) +i = x; + if (b) +j = x; + return j == i; +} + +/* Even with different undefs we should CSE a PHI node with the + same controlling condition. */ + +/* { dg-final { scan-tree-dump "Replaced redundant PHI node" "fre1" } } */ +/* { dg-final { scan-tree-dump "return 1;" "fre1" } } */ diff --git a/gcc/testsuite/gcc.dg/ubsan/pr81981.c b/gcc/testsuite/gcc.dg/ubsan/pr81981.c index 8a6597c84c8..d201efb3f65 100644 --- a/gcc/testsuite/gcc.dg/ubsan/pr81981.c +++ b/gcc/testsuite/gcc.dg/ubsan/pr81981.c @@ -16,6 +16,6 @@ foo (int i) u[0] = i; } - v = u[0];/* { dg-warning "may be used uninitialized" } */ + v = u[0];/* { dg-warning "may be used uninitialized" "" { xfail *-*-* } } */ return t[0]; /* { dg-warning "may be used uninitialized" } */ } diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c index ae0172a143e..893b1d0ddaa 100644 --- a/gcc/tree-ssa-sccvn.c +++ b/gcc/tree-ssa-sccvn.c @@ -4499,7 +4499,12 @@ vn_phi_lookup (gimple *phi, bool backedges_varying_p) tree def = PHI_ARG_DEF_FROM_EDGE (phi, e); if (TREE_CODE (def) == SSA_NAME && (!backedges_varying_p || !(e->flags & EDGE_DFS_BACK))) - def = SSA_VAL (def); + { + if (ssa_undefined_value_p (def, false)) + def = VN_TOP; + else + def = SSA_VAL (def); + } vp1->phiargs[e->dest_idx] = def; } vp1->type = TREE_TYPE (gimple_phi_result (phi)); @@ -4543,7 +4548,12 @@ vn_phi_insert (gimple *phi, tree result, bool backedges_varying_p) tree def = PHI_ARG_DEF_FROM_EDGE (phi, e); if (TREE_CODE (def) == SSA_NAME && (!backedges_varying_p || !(e->flags & EDGE_DFS_BACK))) - def = SSA_VAL (def); + { + if (ssa_undefined_value_p (def, false)) + def = VN_TOP; + else + def = SSA_VAL (def); + } vp1->phiargs[e->dest_idx] = def; } vp1->value_id = VN_INFO (result)->value_id; -- 2.31.1
Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling
"Andre Vieira (lists)" writes: > On 15/10/2021 09:48, Richard Biener wrote: >> On Tue, 12 Oct 2021, Andre Vieira (lists) wrote: >> >>> Hi Richi, >>> >>> I think this is what you meant, I now hide all the unrolling cost >>> calculations >>> in the existing target hooks for costs. I did need to adjust 'finish_cost' >>> to >>> take the loop_vinfo so the target's implementations are able to set the >>> newly >>> renamed 'suggested_unroll_factor'. >>> >>> Also added the checks for the epilogue's VF. >>> >>> Is this more like what you had in mind? >> Not exactly (sorry..). For the target hook I think we don't want to >> pass vec_info but instead another output parameter like the existing >> ones. >> >> vect_estimate_min_profitable_iters should then via >> vect_analyze_loop_costing and vect_analyze_loop_2 report the unroll >> suggestion to vect_analyze_loop which should then, if the suggestion >> was > 1, instead of iterating to the next vector mode run again >> with a fixed VF (old VF times suggested unroll factor - there's >> min_vf in vect_analyze_loop_2 which we should adjust to >> the old VF times two for example and maybe store the suggested >> factor as hint) - if it succeeds the result will end up in the >> list of considered modes (where we now may have more than one >> entry for the same mode but a different VF), we probably want to >> only consider more unrolling once. >> >> For simplicity I'd probably set min_vf = max_vf = old VF * suggested >> factor, thus take the targets request literally. >> >> Richard. > > Hi, > > I now pass an output parameter to finish_costs and route it through the > various calls up to vect_analyze_loop. I tried to rework > vect_determine_vectorization_factor and noticed that merely setting > min_vf and max_vf is not enough, we only use these to check whether the > vectorization factor is within range, well actually we only use max_vf > at that stage. We only seem to use 'min_vf' to make sure the > data_references are valid. I am not sure my changes are the most > appropriate here, for instance I am pretty sure the checks for max and > min vf I added in vect_determine_vectorization_factor are currently > superfluous as they will pass by design, but thought they might be good > future proofing? > > Also I changed how we compare against max_vf, rather than relying on the > 'MAX_VECTORIZATION' I decided to use the estimated_poly_value with > POLY_VALUE_MAX, to be able to bound it further in case we have knowledge > of the VL. I am not entirely about the validity of this change, maybe we > are better off keeping the MAX_VECTORIZATION in place and not making any > changes to max_vf for unrolling. Yeah, estimated_poly_value is just an estimate (even for POLY_VALUE_MAX) rather than a guarantee. We can't rely on it for correctness. Richard
Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling
Richard Biener writes: > That said, the overall flow is OK now, some details about the > max_vf check and where to compute the unrolled VF needs to be > fleshed out. And then there's the main analysis loop which, > frankly, is a mess right now, even before your patch :/ Yeah, the loop is certainly ripe for a rewrite :-) > I'm thinking of rewriting the analysis loop in vect_analyze_loop > to use a worklist initially seeded by the vector_modes[] but > that we can push things like as-main-loop, unrolled and > epilogue analysis to. Maybe have the worklist specify > pairs of mode and kind or tuples of mode, min-VF and kind where > 'kind' is as-main/epilogue/unroll (though maybe 'kind' is > redundant there). Possibly as preparatory step. Sounds good. I think we can also drop some of the complexity if we're prepared to analyse candidate replacements for the main loop separately from candidate epilogue loops (even if the two candidates have the same mode and VF, meaning that a lot of work would be repeated). Thanks, Richard
[COMMITTED] Disregard incoming equivalences to a path when defining a new one.
The equivalence oracle creates a new equiv set at each def point, killing any incoming equivalences, however in the path sensitive oracle we create brand new equivalences at each PHI: BB4: BB8: x_5 = PHI Here we note that x_5 == y_8 at the end of the path. The current code is intersecting this new equivalence with previously known equivalences coming into the path. This is incorrect, as this is a new definition. This patch kills any known equivalence before we register a new one. This hasn't caused problems so far, but upcoming changes to the pipeline has us threading more aggressively and triggering corner cases where this causes incorrect code. I have tested this patch with the usual regstrap cycle. I have also hacked a compiler comparing the old and new behavior to see if we were previously threading paths where the decision was made due to invalid equivalences. Luckily, there were no such paths, but there were 22 paths in a set of .ii files where disregarding incoming relations allowed us to thread the path. This is a miniscule improvement, but we moved a handful of thredable paths earlier in the pipeline, which is always good. Tested on x86-64 Linux. Co-authored-by: Andrew MacLeod gcc/ChangeLog: * gimple-range-path.cc (path_range_query::compute_phi_relations): Kill any global relations we may know before registering a new one. * value-relation.cc (path_oracle::killing_def): New. * value-relation.h (path_oracle::killing_def): New. --- gcc/gimple-range-path.cc | 10 +- gcc/value-relation.cc| 23 +++ gcc/value-relation.h | 1 + 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc index 694271306a7..557338993ae 100644 --- a/gcc/gimple-range-path.cc +++ b/gcc/gimple-range-path.cc @@ -698,7 +698,15 @@ path_range_query::compute_phi_relations (basic_block bb, basic_block prev) tree arg = gimple_phi_arg_def (phi, i); if (gimple_range_ssa_p (arg)) - m_oracle->register_relation (entry, EQ_EXPR, arg, result); + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " from bb%d:", bb->index); + + // Throw away any previous relation. + get_path_oracle ()->killing_def (result); + + m_oracle->register_relation (entry, EQ_EXPR, arg, result); + } break; } diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc index ac5f3f9afc0..2acf375ca9a 100644 --- a/gcc/value-relation.cc +++ b/gcc/value-relation.cc @@ -1285,6 +1285,29 @@ path_oracle::register_equiv (basic_block bb, tree ssa1, tree ssa2) bitmap_ior_into (m_equiv.m_names, b); } +// Register killing definition of an SSA_NAME. + +void +path_oracle::killing_def (tree ssa) +{ + if (dump_file && (dump_flags & TDF_DETAILS)) +{ + fprintf (dump_file, " Registering killing_def (path_oracle) "); + print_generic_expr (dump_file, ssa, TDF_SLIM); + fprintf (dump_file, "\n"); +} + + bitmap b = BITMAP_ALLOC (&m_bitmaps); + bitmap_set_bit (b, SSA_NAME_VERSION (ssa)); + equiv_chain *ptr = (equiv_chain *) obstack_alloc (&m_chain_obstack, + sizeof (equiv_chain)); + ptr->m_names = b; + ptr->m_bb = NULL; + ptr->m_next = m_equiv.m_next; + m_equiv.m_next = ptr; + bitmap_ior_into (m_equiv.m_names, b); +} + // Register relation K between SSA1 and SSA2, resolving unknowns by // querying from BB. diff --git a/gcc/value-relation.h b/gcc/value-relation.h index 53cefbfa7dc..97be3251144 100644 --- a/gcc/value-relation.h +++ b/gcc/value-relation.h @@ -222,6 +222,7 @@ public: ~path_oracle (); const_bitmap equiv_set (tree, basic_block); void register_relation (basic_block, relation_kind, tree, tree); + void killing_def (tree); relation_kind query_relation (basic_block, tree, tree); relation_kind query_relation (basic_block, const_bitmap, const_bitmap); void reset_path (); -- 2.31.1
Re: [match.pd] PR83750 - CSE erf/erfc pair
On Fri, 22 Oct 2021 at 14:56, Richard Biener wrote: > > On Fri, 22 Oct 2021, Prathamesh Kulkarni wrote: > > > On Wed, 20 Oct 2021 at 18:21, Richard Biener wrote: > > > > > > On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > On Tue, 19 Oct 2021 at 16:55, Richard Biener wrote: > > > > > > > > > > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > On Tue, 19 Oct 2021 at 13:02, Richard Biener > > > > > > wrote: > > > > > > > > > > > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via > > > > > > > Gcc-patches > > > > > > > wrote: > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Richard, > > > > > > > > > > > > > > As suggested in PR, I have attached WIP patch that > > > > > > > > > > > > > > adds two patterns > > > > > > > > > > > > > > to match.pd: > > > > > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and, > > > > > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p(). > > > > > > > > > > > > > > > > > > > > > > > > > > > > This works to remove call to erfc for the following > > > > > > > > > > > > > > test: > > > > > > > > > > > > > > double f(double x) > > > > > > > > > > > > > > { > > > > > > > > > > > > > > double g(double, double); > > > > > > > > > > > > > > > > > > > > > > > > > > > > double t1 = __builtin_erf (x); > > > > > > > > > > > > > > double t2 = __builtin_erfc (x); > > > > > > > > > > > > > > return g(t1, t2); > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > with .optimized dump shows: > > > > > > > > > > > > > > t1_2 = __builtin_erf (x_1(D)); > > > > > > > > > > > > > > t2_3 = 1.0e+0 - t1_2; > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, for the following test: > > > > > > > > > > > > > > double f(double x) > > > > > > > > > > > > > > { > > > > > > > > > > > > > > double g(double, double); > > > > > > > > > > > > > > > > > > > > > > > > > > > > double t1 = __builtin_erfc (x); > > > > > > > > > > > > > > return t1; > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does > > > > > > > > > > > > > > not transform 1 - > > > > > > > > > > > > > > erf(x) to erfc(x) again > > > > > > > > > > > > > > post canonicalization. > > > > > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> > > > > > > > > > > > > > > erfc(x) gets applied, > > > > > > > > > > > > > > but then it tries to > > > > > > > > > > > > > > resimplify erfc(x), which fails post > > > > > > > > > > > > > > canonicalization. So we end up > > > > > > > > > > > > > > with erfc(x) transformed to > > > > > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose > > > > > > > > > > > > > > isn't ideal. > > > > > > > > > > > > > > Could you suggest how to proceed ? > > > > > > > > > > > > > > > > > > > > > > > > > > I applied your patch manually and it does the intended > > > > > > > > > > > > > simplifications so I wonder what I am missing? > > > > > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) > > > > > > > > > > > > even when there's > > > > > > > > > > > > no erf(x) in the source ? > > > > > > > > > > > > > > > > > > > > > > I do think it's reasonable to expect erfc to be available > > > > > > > > > > > when erf > > > > > > > > > > > is and vice versa but note both are C99 specified > > > > > > > > > > > functions (either > > > > > > > > > > > requires -lm). > > > > > > > > > > OK, thanks. Would it be OK to commit the patch after > > > > > > > > > > bootstrap+test ? > > > > > > > > > > > > > > > > > > Yes, but I'm confused because you say the patch doesn't work > > > > > > > > > for you? > > > > > > > > The patch works for me to CSE erf/erfc pair. > > > > > > > > However when there's only erfc in the source, it canonicalizes > > > > > > > > erfc(x) > > > > > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back > > > > > > > > to > > > > > > > > erfc(x) > > > > > > > > with -O3 -funsafe-math-optimizations. > > > > > > > > > > > > > > > > For, > > > > > > > > t1 = __builtin_erfc(x), > > > > > > > > > > > > > > > > .optimized dump shows: > > > > > > > > _2 = __builtin_erf (x_1(D)); > > > > > > > > t1_3 = 1.0e+0 - _2; > > > > > > > > > > > > > > > > and for, > > > > > > > > double t1 = x + __builtin_erfc(x); > > > > > > > > >
Re: [PATCH] Convert strlen pass from evrp to ranger.
On Fri, Oct 15, 2021 at 12:39 PM Aldy Hernandez wrote: > Also, I am PINGing patch 0002, which is the strlen pass conversion to > the ranger. As mentioned, this is just a change from an evrp client to > a ranger client. The APIs are exactly the same, and besides, the evrp > analyzer is deprecated and slated for removal. OK for trunk? PING*2
Re: [PATCH] Try to resolve paths in threader without looking further back.
On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor wrote: > I'd like to see gimple-ssa-array-bounds invoked from the access > pass too (instead of from VRP), and eventually -Wrestrict as well. You can do that right now. The pass has been converted to the new API and it would just require calling it with a ranger instead of the vr_values from VRP: array_bounds_checker array_checker (fun, &vrp_vr_values); array_checker.check (); That is, move it where you want and pass it a fresh new gimple_ranger. If there are any regressions, we'd be glad to look at them. > I'm not sure about the strlen/sprintf warnings; those might need > to stay where they are because they run as part of the optimizers > there. > > (By the way, I don't see range info in the access pass at -O0. > Should I?) I assume you mean you don't see anything in the dump files. None of the VRP passes (evrp included) run at -O0, so you wouldn't see anything in the IL. You *may* be able to see some global ranges that DOM's use of the evrp engine exported, but I'm not sure. You're going to have to instantiate a gimple_ranger and use it if you want to have range info available, but that's not going to show up in the IL, even after you use it, because it doesn't export global ranges by default. What are you trying to do? Aldy
[PATCH] tree-optimization/102893 - properly DCE empty loops inside infinite loops
The following fixes the test for an exit edge I put in place for the fix for PR45178 where I somehow misunderstood how the cyclic list works. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-10-22 Richard Biener PR tree-optimization/102893 * tree-ssa-dce.c (find_obviously_necessary_stmts): Fix the test for an exit edge. * gcc.dg/tree-ssa/ssa-dce-9.c: New testcase. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-9.c | 10 ++ gcc/tree-ssa-dce.c| 2 +- 2 files changed, 11 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-9.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-9.c new file mode 100644 index 000..e1ffa7f038d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-9.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-cddce1" } */ + +int main() +{ + while(1) +for(int i=0; i<900; i++){} +} + +/* { dg-final { scan-tree-dump-not "if" "cddce1" } } */ diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c index c4907af923c..372e0691ae6 100644 --- a/gcc/tree-ssa-dce.c +++ b/gcc/tree-ssa-dce.c @@ -436,7 +436,7 @@ find_obviously_necessary_stmts (bool aggressive) for (auto loop : loops_list (cfun, 0)) /* For loops without an exit do not mark any condition. */ - if (loop->exits->next && !finite_loop_p (loop)) + if (loop->exits->next->e && !finite_loop_p (loop)) { if (dump_file) fprintf (dump_file, "cannot prove finiteness of loop %i\n", -- 2.31.1
Re: [RFC PATCH v2 1/1] [ARM] Add support for TLS register based stack protector canary access
On Thu, 21 Oct 2021 at 18:51, Ard Biesheuvel wrote: > > Add support for accessing the stack canary value via the TLS register, > so that multiple threads running in the same address space can use > distinct canary values. This is intended for the Linux kernel running in > SMP mode, where processes entering the kernel are essentially threads > running the same program concurrently: using a global variable for the > canary in that context is problematic because it can never be rotated, > and so the OS is forced to use the same value as long as it remains up. > > Using the TLS register to index the stack canary helps with this, as it > allows each CPU to context switch the TLS register along with the rest > of the process, permitting each process to use its own value for the > stack canary. > > 2021-10-21 Ard Biesheuvel > > * config/arm/arm-opts.h (enum stack_protector_guard): New > * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): > New > * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define > (arm_option_override_internal): Handle and put in error checks > for stack protector guard options. > (arm_option_reconfigure_globals): Likewise > (arm_stack_protect_tls_canary_mem): New > (arm_stack_protect_guard): New > * config/arm/arm.md (stack_protect_set): New > (stack_protect_set_tls): Likewise > (stack_protect_test): Likewise > (stack_protect_test_tls): Likewise > * config/arm/arm.opt (-mstack-protector-guard): New > (-mstack-protector-guard-offset): New. > > Signed-off-by: Ard Biesheuvel > --- > gcc/config/arm/arm-opts.h | 6 ++ > gcc/config/arm/arm-protos.h | 2 + > gcc/config/arm/arm.c| 52 > gcc/config/arm/arm.md | 62 +++- > gcc/config/arm/arm.opt | 22 +++ > gcc/doc/invoke.texi | 9 +++ > 6 files changed, 151 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h > index 5c4b62f404f7..581ba3c4fbbb 100644 > --- a/gcc/config/arm/arm-opts.h > +++ b/gcc/config/arm/arm-opts.h > @@ -69,4 +69,10 @@ enum arm_tls_type { >TLS_GNU, >TLS_GNU2 > }; > + > +/* Where to get the canary for the stack protector. */ > +enum stack_protector_guard { > + SSP_TLSREG, /* per-thread canary in TLS register */ > + SSP_GLOBAL /* global canary */ > +}; > #endif > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 9b1f61394ad7..37e80256a78d 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, > rtx, rtx, rtx, rtx); > extern rtx arm_load_tp (rtx); > extern bool arm_coproc_builtin_available (enum unspecv); > extern bool arm_coproc_ldc_stc_legitimate_address (rtx); > +extern rtx arm_stack_protect_tls_canary_mem (void); > + > > #if defined TREE_CODE > extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index c4ff06b087eb..0bf06e764dbb 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] = > > #undef TARGET_MD_ASM_ADJUST > #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust > + > +#undef TARGET_STACK_PROTECT_GUARD > +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard > > /* Obstack for minipool constant handling. */ > static struct obstack minipool_obstack; > @@ -3155,6 +3158,26 @@ arm_option_override_internal (struct gcc_options *opts, >if (TARGET_THUMB2_P (opts->x_target_flags)) > opts->x_inline_asm_unified = true; > > + if (arm_stack_protector_guard == SSP_GLOBAL > + && opts->x_arm_stack_protector_guard_offset_str) > +{ > + error ("incompatible options %'-mstack-protector-guard=global%' and" > +"%'-mstack-protector-guard-offset=%qs%'", > +arm_stack_protector_guard_offset_str); > +} > + > + if (opts->x_arm_stack_protector_guard_offset_str) > +{ > + char *end; > + const char *str = arm_stack_protector_guard_offset_str; > + errno = 0; > + long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0); > + if (!*str || *end || errno) > + error ("%qs is not a valid offset in %qs", str, > + "-mstack-protector-guard-offset="); > + arm_stack_protector_guard_offset = offs; > +} > + > #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS >SUBTARGET_OVERRIDE_INTERNAL_OPTIONS; > #endif > @@ -3822,6 +3845,10 @@ arm_option_reconfigure_globals (void) >else > target_thread_pointer = TP_SOFT; > } > + > + if (arm_stack_protector_guard == SSP_TLSREG > + && target_thread_pointer != TP_CP15) > +error("%'-mstack-protector-guard=tls%' needs a hardware TLS register"); > } > > /* Perform some validation between
[PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).
This patch adds support for OpenMP 5.0 allocate clause for fortran. It does not yet support the allocator-modifier as specified in OpenMP 5.1. The allocate clause is already supported in C/C++. gcc/fortran/ChangeLog: * dump-parse-tree.c (show_omp_clauses): Handle OMP_LIST_ALLOCATE. * gfortran.h (OMP_LIST_ALLOCATE): New enum value. (allocate): New member in gfc_symbol. * openmp.c (enum omp_mask1): Add OMP_CLAUSE_ALLOCATE. (gfc_match_omp_clauses): Handle OMP_CLAUSE_ALLOCATE (OMP_PARALLEL_CLAUSES, OMP_DO_CLAUSES, OMP_SECTIONS_CLAUSES) (OMP_TASK_CLAUSES, OMP_TASKLOOP_CLAUSES, OMP_TARGET_CLAUSES) (OMP_TEAMS_CLAUSES, OMP_DISTRIBUTE_CLAUSES) (OMP_SINGLE_CLAUSES): Add OMP_CLAUSE_ALLOCATE. (OMP_TASKGROUP_CLAUSES): New (gfc_match_omp_taskgroup): Use 'OMP_TASKGROUP_CLAUSES' instead of 'OMP_CLAUSE_TASK_REDUCTION' (resolve_omp_clauses): Handle OMP_LIST_ALLOCATE. (resolve_omp_do): Avoid warning when loop iteration variable is in allocate clause. * trans-openmp.c (gfc_trans_omp_clauses): Handle translation of allocate clause. (gfc_split_omp_clauses): Update for OMP_LIST_ALLOCATE. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-1.f90: New test. * gfortran.dg/gomp/allocate-2.f90: New test. * gfortran.dg/gomp/collapse1.f90: Update error message. * gfortran.dg/gomp/openmp-simd-4.f90: Likewise. libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-1.c: New test. * testsuite/libgomp.fortran/allocate-1.f90: New test. --- gcc/fortran/dump-parse-tree.c | 1 + gcc/fortran/gfortran.h| 5 + gcc/fortran/openmp.c | 140 +++- gcc/fortran/trans-openmp.c| 34 ++ gcc/testsuite/gfortran.dg/gomp/allocate-1.f90 | 123 +++ gcc/testsuite/gfortran.dg/gomp/allocate-2.f90 | 45 +++ gcc/testsuite/gfortran.dg/gomp/collapse1.f90 | 2 +- .../gfortran.dg/gomp/openmp-simd-4.f90| 6 +- .../testsuite/libgomp.fortran/allocate-1.c| 7 + .../testsuite/libgomp.fortran/allocate-1.f90 | 333 ++ 10 files changed, 675 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-1.c create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-1.f90 diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c index 14a307856fc..66af802ec36 100644 --- a/gcc/fortran/dump-parse-tree.c +++ b/gcc/fortran/dump-parse-tree.c @@ -1685,6 +1685,7 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses) case OMP_LIST_USE_DEVICE_PTR: type = "USE_DEVICE_PTR"; break; case OMP_LIST_USE_DEVICE_ADDR: type = "USE_DEVICE_ADDR"; break; case OMP_LIST_NONTEMPORAL: type = "NONTEMPORAL"; break; + case OMP_LIST_ALLOCATE: type = "ALLOCATE"; break; case OMP_LIST_SCAN_IN: type = "INCLUSIVE"; break; case OMP_LIST_SCAN_EX: type = "EXCLUSIVE"; break; default: diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 66192c07d8c..feae00052cc 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -1388,6 +1388,7 @@ enum OMP_LIST_USE_DEVICE_PTR, OMP_LIST_USE_DEVICE_ADDR, OMP_LIST_NONTEMPORAL, + OMP_LIST_ALLOCATE, OMP_LIST_NUM }; @@ -1880,6 +1881,10 @@ typedef struct gfc_symbol according to the Fortran standard. */ unsigned pass_as_value:1; + /* Used to check if a variable used in allocate clause has also been + used in privatization clause. */ + unsigned allocate:1; + int refs; struct gfc_namespace *ns;/* namespace containing this symbol */ diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index dcf22ac2c2f..aac8d2580a4 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -911,6 +911,7 @@ enum omp_mask1 OMP_CLAUSE_MEMORDER, /* OpenMP 5.0. */ OMP_CLAUSE_DETACH, /* OpenMP 5.0. */ OMP_CLAUSE_AFFINITY, /* OpenMP 5.0. */ + OMP_CLAUSE_ALLOCATE, /* OpenMP 5.0. */ OMP_CLAUSE_BIND, /* OpenMP 5.0. */ OMP_CLAUSE_FILTER, /* OpenMP 5.1. */ OMP_CLAUSE_AT, /* OpenMP 5.1. */ @@ -1540,6 +1541,40 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, } continue; } + if ((mask & OMP_CLAUSE_ALLOCATE) + && gfc_match ("allocate ( ") == MATCH_YES) + { + gfc_expr *allocator = NULL; + old_loc = gfc_current_locus; + m = gfc_match_expr (&allocator); + if (m != MATCH_YES) + { + gfc_error ("Expected allocator or variable list at %C"); + goto error; + } + if (gfc_match (" : ") != MATCH_YES) + { + /* If n
Re: José's pending bind(C) patches / status (was: Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid)
Hi José, hi Fortraners, triage of all listed patches: On 21.06.21 17:21, José Rui Faustino de Sousa wrote: https://gcc.gnu.org/pipermail/fortran/2021-April/055924.html PR100040 - Wrong code with intent out assumed-rank allocatable PR100029 - ICE on subroutine call with allocatable polymorphic → Both: Still occurs, ICE in gfc_deallocate_scalar_with_status TODO: Review patch. https://gcc.gnu.org/pipermail/fortran/2021-April/055933.html PR100097 - Unlimited polymorphic pointers and allocatables have incorrect rank PR100098 - Polymorphic pointers and allocatables have incorrect rank → Both: PASS TODO: Check whether it makes sense to apply the testcase TODO: Close PRs → See also patch below (2021-June/056169.html) https://gcc.gnu.org/pipermail/fortran/2021-June/056168.html PR96870 - Class name on error message → Fixed with sufficient test coverage; thus, I closed the PR. Nothing to be done. https://gcc.gnu.org/pipermail/fortran/2021-June/056167.html PR96724 - Bogus warnings with the repeat intrinsic and the flag -Wconversion-extra| repeat ('x', NCOPIES=i08) ! i08 is 20_1 shows: Warning: Conversion from INTEGER(1) to INTEGER(8) at (1) [-Wconversion-extra] TODO: Review patch. | https://gcc.gnu.org/pipermail/fortran/2021-June/056163.html Bug 93308 - bind(c) subroutine changes lower bound of array argument in caller Bug 93963 - Select rank mishandling allocatable and pointer arguments with bind(c) Bug 94327 - Bind(c) argument attributes are incorrectly set Bug 94331 - Bind(C) corrupts array descriptors Bug 97046 - Bad interaction between lbound/ubound, allocatable arrays and bind(C) subroutine with dimension(..) parameter → All already closed as FIXED TODO: Nothing, unless we want to pick one of the testcases. https://gcc.gnu.org/pipermail/fortran/2021-June/056162.html PR94104 - Request for diagnostic improvement 10 | print *, sumf(a) |1 Error: Actual argument for ‘a’ must be a pointer at (1) NOTE: as the dummy is intent(in), since F2008 alternatively a TARGET attr would be also okay. TODO: Review patch - in principle, I am fine with the but I am not sure the 'valid target' in the error message is clear enough. Might require some message tweaking for clarity. https://gcc.gnu.org/pipermail/fortran/2021-June/056155.html Gerald's PR100948 - [12 Regression] ICE in gfc_conv_expr_val, at fortran/trans-expr.c:9069 Still has an ICE. TODO: Review patch. https://gcc.gnu.org/pipermail/fortran/2021-June/056154.html Bug 100906 - Bind(c): failure handling character with len/=1 → Testcase now passes. Bug 100907 - Bind(c): failure handling wide character → I think now okay – but the testcase assumes elem_len/sizeof(char) == #chars but for the C descriptor, elem_len / sizeof(char-type) = #chars Thus, sz is not 1 or 7 bytes but 4 or 28 bytes (or 1/7 characters) Bug 100911 - Bind(c): failure handling C_PTR → Closed as FIXED. Bug 100914 - Bind(c): errors handling complex → Closed as FIXED Bug 100915 - Bind(c): failure handling C_FUNPTR → Closed as FIXED Bug 100916 - Bind(c): CFI_type_other unimplemented → Bogus testcase (for 't(ype)' argument) otherwise it expects CFI_type_other instead of CFI_type_struct (TODO: Is that sensible?) TODO: Check whether a testcase is needed TODO: Close the three still open PRs https://gcc.gnu.org/pipermail/fortran/2021-June/056152.html Bug 101047 - Pointer explicit initialization fails Bug 101048 - Class pointer explicit initialization refuses valid ..., pointer, save :: ptr => tgt fails to associate ptr with tgt (wrong-code + rejects valid) TODO: Review patch. https://gcc.gnu.org/pipermail/fortran/2021-June/056159.html PR92621 - Problems with memory handling with allocatable intent(out) arrays with bind(c) I think mostly fixed by my big bind(C) patch, but there still one ICE with '-fcheck=all -fsanitize=undefined' TODO: Fix that bug (unlikely to be fixed by José's patch) TODO: Check whether testcase should be added and then close the PR https://gcc.gnu.org/pipermail/fortran/2021-April/055982.html PR100245 - ICE on automatic reallocation. Still ICEs TODO: Review patch. https://gcc.gnu.org/pipermail/fortran/2021-April/055949.html PR100136 - ICE, regression, using flag -fcheck=pointer First testcase has an ICE with -fcheck=pointer Second testcase has always an ICE + possibly missing func. TODO: Review patch – and probably: follow-up patch for remaining issue https://gcc.gnu.org/pipermail/fortran/2021-April/055946.html PR100132 - Optimization breaks pointer association. 'fn spec' is wrong :-( TODO: Review patch! https://gcc.gnu.org/pipermail/fortran/2021-April/055934.html PR100103 - Automatic reallocation fails inside select rank Still segfaults at runtime for 'that = a' where the RHS is a parameter and the LHS an allocatable assumed-rank array (inside select rank). TODO: Review patch https://gcc.gnu.org/pipermail/fortran/2021-June/056169.html PR100097 - Unlimited polymorphic pointers a
Re: [PATCH] Canonicalize __atomic/sync_fetch_or/xor/and for constant mask.
On Thu, Oct 21, 2021 at 10:48 PM liuhongt wrote: > > Hi: > This patch is try to canoicalize bit_and and nop_convert order for > __atomic_fetch_or_*, __atomic_fetch_xor_*, > __atomic_xor_fetch_*,__sync_fetch_and_or_*, > __sync_fetch_and_xor_*,__sync_xor_and_fetch_*, > __atomic_fetch_and_*,__sync_fetch_and_and_* when mask is constant. > > .i.e. > > +/* Canonicalize > + _1 = __atomic_fetch_or_4 (&v, 1, 0); > + _2 = (int) _1; > + _5 = _2 & 1; > + > +to > + > + _1 = __atomic_fetch_or_4 (&v, 1, 0); > + _2 = _1 & 1; > + _5 = (int) _2; > > +/* Convert > + _1 = __atomic_fetch_and_4 (a_6(D), 4294959103, 0); > + _2 = (int) _1; > + _3 = _2 & 8192; > +to > + _1 = __atomic_fetch_and_4 (a_4(D), 4294959103, 0); > + _7 = _1 & 8192; > + _6 = (int) _7; > + So it can be handled by optimize_atomic_bit_test_and. */ > > I'm trying to rewrite match part in match.pd and find the > canonicalization is ok when mask is constant, but not for variable > since it will be simplified back by > /* In GIMPLE, getting rid of 2 conversions for one new results > in smaller IL. */ > (simplify > (convert (bitop:cs@2 (nop_convert:s @0) @1)) > (if (GIMPLE >&& TREE_CODE (@1) != INTEGER_CST >&& tree_nop_conversion_p (type, TREE_TYPE (@2)) >&& types_match (type, @0)) >(bitop @0 (convert @1) > > The canonicalization for variabled is like > > convert > _1 = ~mask_7; > _2 = (unsigned int) _1; > _3 = __atomic_fetch_and_4 (ptr_6, _2, 0); > _4 = (int) _3; > _5 = _4 & mask_7; > > to > _1 = ~mask_7; > _2 = (unsigned int) _1; > _3 = __atomic_fetch_and_4 (ptr_6, _2, 0); > _4 = (unsigned int) mask_7 > _6 = _3 & _4 > _5 = (int) _6 > > and be simplified back. > > I've also tried another way of simplication like > > convert > _1 = ~mask_7; > _2 = (unsigned int) _1; > _3 = __atomic_fetch_and_4 (ptr_6, _2, 0); > _4 = (int) _3; > _5 = _4 & mask_7; > > to > _1 = (unsigned int)mask_7; > _2 = ~ _1; > _3 = __atomic_fetch_and_4 (ptr_6, _2, 0); >_6 = _3 & _1 > _5 = (int) > > but it's prevent by below since __atomic_fetch_and_4 is not CONST, but > we need to regenerate it with updated parameter. > > /* We can't and should not emit calls to non-const functions. */ > if (!(flags_from_decl_or_type (decl) & ECF_CONST)) > return NULL; > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > * match.pd: Canonicalize bit_and and nop_convert order for > __atomic/sync_fetch_or/xor/and for when mask is constant. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr102566-1a.c: New test. > * gcc.target/i386/pr102566-2a.c: New test. > --- > gcc/match.pd| 118 > gcc/testsuite/gcc.target/i386/pr102566-1a.c | 66 +++ > gcc/testsuite/gcc.target/i386/pr102566-2a.c | 65 +++ > 3 files changed, 249 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr102566-1a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr102566-2a.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index 5bed2e12715..06b369d1ab1 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -104,6 +104,39 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (define_operator_list COND_TERNARY >IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS) > > +/* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_* */ > +(define_operator_list ATOMIC_FETCH_OR_XOR_N > + BUILT_IN_ATOMIC_FETCH_OR_1 BUILT_IN_ATOMIC_FETCH_OR_2 > + BUILT_IN_ATOMIC_FETCH_OR_4 BUILT_IN_ATOMIC_FETCH_OR_8 > + BUILT_IN_ATOMIC_FETCH_OR_16 > + BUILT_IN_ATOMIC_FETCH_XOR_1 BUILT_IN_ATOMIC_FETCH_XOR_2 > + BUILT_IN_ATOMIC_FETCH_XOR_4 BUILT_IN_ATOMIC_FETCH_XOR_8 > + BUILT_IN_ATOMIC_FETCH_XOR_16 > + BUILT_IN_ATOMIC_XOR_FETCH_1 BUILT_IN_ATOMIC_XOR_FETCH_2 > + BUILT_IN_ATOMIC_XOR_FETCH_4 BUILT_IN_ATOMIC_XOR_FETCH_8 > + BUILT_IN_ATOMIC_XOR_FETCH_16) > +/* __sync_fetch_and_or_*, __sync_fetch_and_xor_*, __sync_xor_and_fetch_* */ > +(define_operator_list SYNC_FETCH_OR_XOR_N > + BUILT_IN_SYNC_FETCH_AND_OR_1 BUILT_IN_SYNC_FETCH_AND_OR_2 > + BUILT_IN_SYNC_FETCH_AND_OR_4 BUILT_IN_SYNC_FETCH_AND_OR_8 > + BUILT_IN_SYNC_FETCH_AND_OR_16 > + BUILT_IN_SYNC_FETCH_AND_XOR_1 BUILT_IN_SYNC_FETCH_AND_XOR_2 > + BUILT_IN_SYNC_FETCH_AND_XOR_4 BUILT_IN_SYNC_FETCH_AND_XOR_8 > + BUILT_IN_SYNC_FETCH_AND_XOR_16 > + BUILT_IN_SYNC_XOR_AND_FETCH_1 BUILT_IN_SYNC_XOR_AND_FETCH_2 > + BUILT_IN_SYNC_XOR_AND_FETCH_4 BUILT_IN_SYNC_XOR_AND_FETCH_8 > + BUILT_IN_SYNC_XOR_AND_FETCH_16) > +/* __atomic_fetch_and_*. */ > +(define_operator_list ATOMIC_FETCH_AND_N > + BUILT_IN_ATOMIC_FETCH_AND_1 BUILT_IN_ATOMIC_FETCH_AND_2 > + BUILT_IN_ATOMIC_FETCH_AND_4 BUILT_IN_ATOMIC_FETCH_AND_8 > + BUILT_IN_ATOMIC_FETCH_AND_16) > +/* __sync_fetch_and_and_*. */ > +(define_operator_list SYNC_FETCH_AND_AND_N > + BUILT_IN_SYNC_FETCH_AND_AND_1 BUILT_IN_SYNC_FETCH_AND_AND_2 > + BUILT_IN_SYNC_FETCH_AND_AND_4 BUILT_IN_SYNC_FETCH_AND_AND_8
Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).
Hi all, On 22.10.21 15:05, Hafiz Abid Qadeer wrote: This patch adds support for OpenMP 5.0 allocate clause for fortran. It does not yet support the allocator-modifier as specified in OpenMP 5.1. The allocate clause is already supported in C/C++. I think the following shouldn't block the acceptance of the patch, but I think we eventually need to handle the following as well: type t integer, allocatable :: xx(:) end type type(t) :: tt class(t), allocatable :: cc allocate(t :: cc) tt%xx = [1,2,3,4,5,6] cc%xx = [1,2,3,4,5,6] ! ... !$omp task firstprivate(tt, cc) allocate(h) ... In my spec reading, both tt/cc itself and tt%ii and cc%ii should use the specified allocator. And unless I missed something (I only glanced at the patch so far), it is not handled. But for derived types (except for recursive allocatables, valid since 5.1), I think it can be handled in gfc_omp_clause_copy_ctor / gfc_omp_clause_dtor, but I have not checked whether those support it properly. For CLASS + recursive allocatables, it requires some more changes (which might be provided by my derived-type deep copy patch, of which only 1/3 has been written). Tobias PS: Just a side note, OpenMP has the following for Fortran: "If any operation of the base language causes a reallocation of a variable that is allocated with a memory allocator then that memory allocator will be used to deallocate the current memory and to allocate the new memory. For allocated allocatable components of such variables, the allocator that will be used for the deallocation and allocation is unspecified." - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH] Try to resolve paths in threader without looking further back.
On 10/22/21 5:22 AM, Aldy Hernandez wrote: On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor wrote: I'd like to see gimple-ssa-array-bounds invoked from the access pass too (instead of from VRP), and eventually -Wrestrict as well. You can do that right now. The pass has been converted to the new API and it would just require calling it with a ranger instead of the vr_values from VRP: array_bounds_checker array_checker (fun, &vrp_vr_values); array_checker.check (); That is, move it where you want and pass it a fresh new gimple_ranger. If there are any regressions, we'd be glad to look at them. I appreciate that and I'm not worried about regressions due to ranger code. It's not so simple as it seems because of the optimizer dependencies I mentioned. -Warray-bounds runs before vectorization and the access pass after it. Moving it into the access pass will mean dealing with the fallout: either accepting regressions in the quality of warnings (bad locations due to vectorization merging distinct stores into one) or running the access pass at a different point in the pipeline, and facing regressions in the other warnings due to that. Running it twice, once earlier for -Warray-bounds and then again later for -Wstringop-overflow etc, would be less than optimal because they all do the same thing (compute object sizes and offsets) and should be able to share the same data (the pointer query cache). So the ideal solution is to find a middle ground where all these warnings can run from the same pass with optimal results. -Warray-bounds might also need to be adjusted for -O0 to avoid warning on unreachable code, although, surprisingly, that hasn't been an issue for the other warnings now enabled at -O0. All this will take some time, which I'm out of for this stage 1. I'm not sure about the strlen/sprintf warnings; those might need to stay where they are because they run as part of the optimizers there. (By the way, I don't see range info in the access pass at -O0. Should I?) I assume you mean you don't see anything in the dump files. I mean that I don't get accurate range info from the ranger instance in any function. I'd like the example below to trigger a warning even at -O0 but it doesn't because n's range is [0, UINT_MAX] instead of [7, UINT_MAX]: char a[4]; void f (unsigned n) { if (n < 7) n = 7; __builtin_memset (a, 0, n); } None of the VRP passes (evrp included) run at -O0, so you wouldn't see anything in the IL. You *may* be able to see some global ranges that DOM's use of the evrp engine exported, but I'm not sure. You're going to have to instantiate a gimple_ranger and use it if you want to have range info available, but that's not going to show up in the IL, even after you use it, because it doesn't export global ranges by default. What are you trying to do? The above. The expected warning runs in the access warning pass. It uses the per-function instance of the ranger but it gets back a range for the type. To see that put a breakpoint in get_size_range() in pointer-query.cc and compile the above with -O0. Martin
[PATCH 1/6] aarch64: Move Neon vector-tuple type declaration into the compiler
Hi, As subject, this patch declares the Neon vector-tuple types inside the compiler instead of in the arm_neon.h header. This is a necessary first step before adding corresponding machine modes to the AArch64 backend. The vector-tuple types are implemented using a #pragma. This means initialization of builtin functions that have vector-tuple types as arguments or return values has to be delayed until the #pragma is handled. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Note that this patch series cannot be merged until the following has been accepted: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581948.html Ok for master with this proviso? Thanks, Jonathan --- gcc/ChangeLog: 2021-09-10 Jonathan Wright * config/aarch64/aarch64-builtins.c (aarch64_init_simd_builtins): Factor out main loop to... (aarch64_init_simd_builtin_functions): This new function. (register_tuple_type): Define. (aarch64_scalar_builtin_type_p): Define. (handle_arm_neon_h): Define. * config/aarch64/aarch64-c.c (aarch64_pragma_aarch64): Handle pragma for arm_neon.h. * config/aarch64/aarch64-protos.h (aarch64_advsimd_struct_mode_p): Declare. (handle_arm_neon_h): Likewise. * config/aarch64/aarch64.c (aarch64_advsimd_struct_mode_p): Remove static modifier. * config/aarch64/arm_neon.h (target): Remove Neon vector structure type definitions. rb14838.patch Description: rb14838.patch
[PATCH 2/6] gcc/expr.c: Remove historic workaround for broken SIMD subreg
Hi, A long time ago, using a parallel to take a subreg of a SIMD register was broken. This temporary fix[1] (from 2003) spilled these registers to memory and reloaded the appropriate part to obtain the subreg. The fix initially existed for the benefit of the PowerPC E500 - a platform for which GCC removed support a number of years ago. Regardless, a proper mechanism for taking a subreg of a SIMD register exists now anyway. This patch removes the workaround thus preventing SIMD registers being dumped to memory unnecessarily - which sometimes can't be fixed by later passes. Bootstrapped and regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu - no issues. Ok for master? Thanks, Jonathan [1] https://gcc.gnu.org/pipermail/gcc-patches/2003-April/102099.html --- gcc/ChangeLog: 2021-10-11 Jonathan Wright * expr.c (emit_group_load_1): Remove historic workaround. rb14923.patch Description: rb14923.patch
[PATCH 3/6] gcc/expmed.c: Ensure vector modes are tieable before extraction
Hi, Extracting a bitfield from a vector can be achieved by casting the vector to a new type whose elements are the same size as the desired bitfield, before generating a subreg. However, this is only an optimization if the original vector can be accessed in the new machine mode without first being copied - a condition denoted by the TARGET_MODES_TIEABLE_P hook. This patch adds a check to make sure that the vector modes are tieable before attempting to generate a subreg. This is a necessary prerequisite for a subsequent patch that will introduce new machine modes for Arm Neon vector-tuple types. Bootstrapped and regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-10-11 Jonathan Wright * expmed.c (extract_bit_field_1): Ensure modes are tieable. rb14926.patch Description: rb14926.patch
[PATCH 5/6] gcc/lower_subreg.c: Prevent decomposition if modes are not tieable
Hi, Preventing decomposition if modes are not tieable is necessary to stop AArch64 partial Neon structure modes being treated as packed in registers. This is a necessary prerequisite for a future AArch64 PCS change to maintain good code generation. Bootstrapped and regression tested on: * x86_64-pc-linux-gnu - no issues. * aarch64-none-linux-gnu - two test failures which will be fixed by the next patch in this series. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-10-14 Jonathan Wright * lower-subreg.c (simple_move): Prevent decomposition if modes are not tieable. rb14936.patch Description: rb14936.patch
[PATCH 6/6] aarch64: Pass and return Neon vector-tuple types without a parallel
Hi, Neon vector-tuple types can be passed in registers on function call and return - there is no need to generate a parallel rtx. This patch adds cases to detect vector-tuple modes and generates an appropriate register rtx. This change greatly improves code generated when passing Neon vector- tuple types between functions; many new test cases are added to defend these improvements. Bootstrapped and regression tested on aarch64-none-linux-gnu and aarch64_be-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-10-07 Jonathan Wright * config/aarch64/aarch64.c (aarch64_function_value): Generate a register rtx for Neon vector-tuple modes. (aarch64_layout_arg): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vector_structure_intrinsics.c: New code generation tests. rb14937.patch Description: rb14937.patch
Re: [PATCH 1/6] aarch64: Move Neon vector-tuple type declaration into the compiler
Jonathan Wright writes: > Hi, > > As subject, this patch declares the Neon vector-tuple types inside the > compiler instead of in the arm_neon.h header. This is a necessary first > step before adding corresponding machine modes to the AArch64 > backend. > > The vector-tuple types are implemented using a #pragma. This means > initialization of builtin functions that have vector-tuple types as > arguments or return values has to be delayed until the #pragma is > handled. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Note that this patch series cannot be merged until the following has > been accepted: > https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581948.html > > Ok for master with this proviso? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-09-10 Jonathan Wright > > * config/aarch64/aarch64-builtins.c (aarch64_init_simd_builtins): > Factor out main loop to... > (aarch64_init_simd_builtin_functions): This new function. > (register_tuple_type): Define. > (aarch64_scalar_builtin_type_p): Define. > (handle_arm_neon_h): Define. > * config/aarch64/aarch64-c.c (aarch64_pragma_aarch64): Handle > pragma for arm_neon.h. > * config/aarch64/aarch64-protos.h (aarch64_advsimd_struct_mode_p): > Declare. > (handle_arm_neon_h): Likewise. > * config/aarch64/aarch64.c (aarch64_advsimd_struct_mode_p): > Remove static modifier. > * config/aarch64/arm_neon.h (target): Remove Neon vector > structure type definitions. OK when the prerequisite you mention is applied, thanks. Richard > diff --git a/gcc/config/aarch64/aarch64-builtins.c > b/gcc/config/aarch64/aarch64-builtins.c > index > 1a507ea59142d0b5977b0167abfe9a58a567adf7..27f2dc5ea4337da80f3b84b6a798263e7bd9012e > 100644 > --- a/gcc/config/aarch64/aarch64-builtins.c > +++ b/gcc/config/aarch64/aarch64-builtins.c > @@ -1045,32 +1045,22 @@ aarch64_init_fcmla_laneq_builtins (void) > } > > void > -aarch64_init_simd_builtins (void) > +aarch64_init_simd_builtin_functions (bool called_from_pragma) > { >unsigned int i, fcode = AARCH64_SIMD_PATTERN_START; > > - if (aarch64_simd_builtins_initialized_p) > -return; > - > - aarch64_simd_builtins_initialized_p = true; > - > - aarch64_init_simd_builtin_types (); > - > - /* Strong-typing hasn't been implemented for all AdvSIMD builtin > intrinsics. > - Therefore we need to preserve the old __builtin scalar types. It can be > - removed once all the intrinsics become strongly typed using the > qualifier > - system. */ > - aarch64_init_simd_builtin_scalar_types (); > - > - tree lane_check_fpr = build_function_type_list (void_type_node, > - size_type_node, > - size_type_node, > - intSI_type_node, > - NULL); > - aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_LANE_CHECK] > -= aarch64_general_add_builtin ("__builtin_aarch64_im_lane_boundsi", > -lane_check_fpr, > -AARCH64_SIMD_BUILTIN_LANE_CHECK); > + if (!called_from_pragma) > +{ > + tree lane_check_fpr = build_function_type_list (void_type_node, > + size_type_node, > + size_type_node, > + intSI_type_node, > + NULL); > + aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_LANE_CHECK] > + = aarch64_general_add_builtin ("__builtin_aarch64_im_lane_boundsi", > +lane_check_fpr, > +AARCH64_SIMD_BUILTIN_LANE_CHECK); > +} > >for (i = 0; i < ARRAY_SIZE (aarch64_simd_builtin_data); i++, fcode++) > { > @@ -1100,6 +1090,18 @@ aarch64_init_simd_builtins (void) >tree return_type = void_type_node, args = void_list_node; >tree eltype; > > + int struct_mode_args = 0; > + for (int j = op_num; j >= 0; j--) > + { > + machine_mode op_mode = insn_data[d->code].operand[j].mode; > + if (aarch64_advsimd_struct_mode_p (op_mode)) > + struct_mode_args++; > + } > + > + if ((called_from_pragma && struct_mode_args == 0) > + || (!called_from_pragma && struct_mode_args > 0)) > + continue; > + >/* Build a function type directly from the insn_data for this >builtin. The build_function_type () function takes care of >removing duplicates for us. */ > @@ -1173,9 +1175,82 @@ aarch64_init_simd_builtins (void) >fndecl = aarch64_general_add_builtin (namebuf, ftype, fcode, attrs); >aarch64_builtin_decls[fcode] = fndecl; > } > +} > + > +/* Register the tuple type that c
Re: [PATCH 2/6] gcc/expr.c: Remove historic workaround for broken SIMD subreg
Jonathan Wright writes: > Hi, > > A long time ago, using a parallel to take a subreg of a SIMD register > was broken. This temporary fix[1] (from 2003) spilled these registers > to memory and reloaded the appropriate part to obtain the subreg. > > The fix initially existed for the benefit of the PowerPC E500 - a > platform for which GCC removed support a number of years ago. > Regardless, a proper mechanism for taking a subreg of a SIMD register > exists now anyway. > > This patch removes the workaround thus preventing SIMD registers > being dumped to memory unnecessarily - which sometimes can't be fixed > by later passes. > > Bootstrapped and regression tested on aarch64-none-linux-gnu and > x86_64-pc-linux-gnu - no issues. > > Ok for master? > > Thanks, > Jonathan > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2003-April/102099.html > > --- > > gcc/ChangeLog: > > 2021-10-11 Jonathan Wright > > * expr.c (emit_group_load_1): Remove historic workaround. OK, thanks. Richard > diff --git a/gcc/expr.c b/gcc/expr.c > index > e0bcbccd9053df168c2e861414729fc7cf017f85..62446118b7beb725933ec6f7b0386e7e4b84fa90 > 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -2508,19 +2508,6 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, > tree type, > NULL); > } > } > - /* FIXME: A SIMD parallel will eventually lead to a subreg of a > - SIMD register, which is currently broken. While we get GCC > - to emit proper RTL for these cases, let's dump to memory. */ > - else if (VECTOR_MODE_P (GET_MODE (dst)) > -&& REG_P (src)) > - { > - poly_uint64 slen = GET_MODE_SIZE (GET_MODE (src)); > - rtx mem; > - > - mem = assign_stack_temp (GET_MODE (src), slen); > - emit_move_insn (mem, src); > - tmps[i] = adjust_address (mem, mode, bytepos); > - } >else if (CONSTANT_P (src) && GET_MODE (dst) != BLKmode > && XVECLEN (dst, 0) > 1) > tmps[i] = simplify_gen_subreg (mode, src, GET_MODE (dst), bytepos);
Re: [PATCH 3/6] gcc/expmed.c: Ensure vector modes are tieable before extraction
Jonathan Wright writes: > Hi, > > Extracting a bitfield from a vector can be achieved by casting the > vector to a new type whose elements are the same size as the desired > bitfield, before generating a subreg. However, this is only an > optimization if the original vector can be accessed in the new > machine mode without first being copied - a condition denoted by the > TARGET_MODES_TIEABLE_P hook. > > This patch adds a check to make sure that the vector modes are > tieable before attempting to generate a subreg. This is a necessary > prerequisite for a subsequent patch that will introduce new machine > modes for Arm Neon vector-tuple types. > > Bootstrapped and regression tested on aarch64-none-linux-gnu and > x86_64-pc-linux-gnu - no issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-10-11 Jonathan Wright > > * expmed.c (extract_bit_field_1): Ensure modes are tieable. OK, thanks. Richard > diff --git a/gcc/expmed.c b/gcc/expmed.c > index > 59734d4841cbd2056a7d5bda9134af79c8024c87..f58fb9d877d66809b39253ccdc803f0ecb009326 > 100644 > --- a/gcc/expmed.c > +++ b/gcc/expmed.c > @@ -1734,7 +1734,8 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, > poly_uint64 bitnum, >FOR_EACH_MODE_FROM (new_mode, new_mode) > if (known_eq (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (GET_MODE (op0))) > && known_eq (GET_MODE_UNIT_SIZE (new_mode), GET_MODE_SIZE (tmode)) > - && targetm.vector_mode_supported_p (new_mode)) > + && targetm.vector_mode_supported_p (new_mode) > + && targetm.modes_tieable_p (GET_MODE (op0), new_mode)) > break; >if (new_mode != VOIDmode) > op0 = gen_lowpart (new_mode, op0);
[PATCH 4/6] aarch64: Add machine modes for Neon vector-tuple types
Hi, Until now, GCC has used large integer machine modes (OI, CI and XI) to model Neon vector-tuple types. This is suboptimal for many reasons, the most notable are: 1) Large integer modes are opaque and modifying one vector in the tuple requires a lot of inefficient set/get gymnastics. The result is a lot of superfluous move instructions. 2) Large integer modes do not map well to types that are tuples of 64-bit vectors - we need additional zero-padding which again results in superfluous move instructions. This patch adds new machine modes that better model the C-level Neon vector-tuple types. The approach is somewhat similar to that already used for SVE vector-tuple types. All of the AArch64 backend patterns and builtins that manipulate Neon vector tuples are updated to use the new machine modes. This has the effect of significantly reducing the amount of boiler-plate code in the arm_neon.h header. While this patch increases the quality of code generated in many instances, there is still room for significant improvement - which will be attempted in subsequent patches. Bootstrapped and regression tested on aarch64-none-linux-gnu and aarch64_be-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-08-09 Jonathan Wright Richard Sandiford * config/aarch64/aarch64-builtins.c (v2x8qi_UP): Define. (v2x4hi_UP): Likewise. (v2x4hf_UP): Likewise. (v2x4bf_UP): Likewise. (v2x2si_UP): Likewise. (v2x2sf_UP): Likewise. (v2x1di_UP): Likewise. (v2x1df_UP): Likewise. (v2x16qi_UP): Likewise. (v2x8hi_UP): Likewise. (v2x8hf_UP): Likewise. (v2x8bf_UP): Likewise. (v2x4si_UP): Likewise. (v2x4sf_UP): Likewise. (v2x2di_UP): Likewise. (v2x2df_UP): Likewise. (v3x8qi_UP): Likewise. (v3x4hi_UP): Likewise. (v3x4hf_UP): Likewise. (v3x4bf_UP): Likewise. (v3x2si_UP): Likewise. (v3x2sf_UP): Likewise. (v3x1di_UP): Likewise. (v3x1df_UP): Likewise. (v3x16qi_UP): Likewise. (v3x8hi_UP): Likewise. (v3x8hf_UP): Likewise. (v3x8bf_UP): Likewise. (v3x4si_UP): Likewise. (v3x4sf_UP): Likewise. (v3x2di_UP): Likewise. (v3x2df_UP): Likewise. (v4x8qi_UP): Likewise. (v4x4hi_UP): Likewise. (v4x4hf_UP): Likewise. (v4x4bf_UP): Likewise. (v4x2si_UP): Likewise. (v4x2sf_UP): Likewise. (v4x1di_UP): Likewise. (v4x1df_UP): Likewise. (v4x16qi_UP): Likewise. (v4x8hi_UP): Likewise. (v4x8hf_UP): Likewise. (v4x8bf_UP): Likewise. (v4x4si_UP): Likewise. (v4x4sf_UP): Likewise. (v4x2di_UP): Likewise. (v4x2df_UP): Likewise. (TYPES_GETREGP): Delete. (TYPES_SETREGP): Likewise. (TYPES_LOADSTRUCT_U): Define. (TYPES_LOADSTRUCT_P): Likewise. (TYPES_LOADSTRUCT_LANE_U): Likewise. (TYPES_LOADSTRUCT_LANE_P): Likewise. (TYPES_STORE1P): Move for consistency. (TYPES_STORESTRUCT_U): Define. (TYPES_STORESTRUCT_P): Likewise. (TYPES_STORESTRUCT_LANE_U): Likewise. (TYPES_STORESTRUCT_LANE_P): Likewise. (aarch64_simd_tuple_types): Define. (aarch64_lookup_simd_builtin_type): Handle tuple type lookup. (aarch64_init_simd_builtin_functions): Update frontend lookup for builtin functions after handling arm_neon.h pragma. (register_tuple_type): Manually set modes of single-integer tuple types. Record tuple types. * config/aarch64/aarch64-modes.def (ADV_SIMD_D_REG_STRUCT_MODES): Define D-register tuple modes. (ADV_SIMD_Q_REG_STRUCT_MODES): Define Q-register tuple modes. (SVE_MODES): Give single-vector modes priority over vector- tuple modes. (VECTOR_MODES_WITH_PREFIX): Set partial-vector mode order to be after all single-vector modes. * config/aarch64/aarch64-simd-builtins.def: Update builtin generator macros to reflect modifications to the backend patterns. * config/aarch64/aarch64-simd.md (aarch64_simd_ld2): Use vector-tuple mode iterator and rename to... (aarch64_simd_ld2): This. (aarch64_simd_ld2r): Use vector-tuple mode iterator and rename to... (aarch64_simd_ld2r): This. (aarch64_vec_load_lanesoi_lane): Use vector-tuple mode iterator and rename to... (aarch64_vec_load_lanes_lane): This. (vec_load_lanesoi): Use vector-tuple mode iterator and rename to... (vec_load_lanes): This. (aarch64_simd_st2): Use vector-tuple mode iterator and rename to... (aarch64_simd_st2): This. (aarch64_vec_store_lanesoi_lane): Use vector-tuple mode iterator and rename to... (aarch64_vec_store_lanes_lane): This.
Re: [PATCH] x86_64: Add insn patterns for V1TI mode logic operations.
On Fri, Oct 22, 2021 at 9:19 AM Roger Sayle wrote: > > > On x86_64, V1TI mode holds a 128-bit integer value in a (vector) SSE > register (where regular TI mode uses a pair of 64-bit general purpose > scalar registers). This patch improves the implementation of AND, IOR, > XOR and NOT on these values. > > The benefit is demonstrated by the following simple test program: > > typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16))); > v1ti and(v1ti x, v1ti y) { return x & y; } > v1ti ior(v1ti x, v1ti y) { return x | y; } > v1ti xor(v1ti x, v1ti y) { return x ^ y; } > v1ti not(v1ti x) { return ~x; } > > For which GCC currently generates the rather large: > > and:movdqa %xmm0, %xmm2 > movq%xmm1, %rdx > movq%xmm0, %rax > andq%rdx, %rax > movhlps %xmm2, %xmm3 > movhlps %xmm1, %xmm4 > movq%rax, %xmm0 > movq%xmm4, %rdx > movq%xmm3, %rax > andq%rdx, %rax > movq%rax, %xmm5 > punpcklqdq %xmm5, %xmm0 > ret > > ior:movdqa %xmm0, %xmm2 > movq%xmm1, %rdx > movq%xmm0, %rax > orq %rdx, %rax > movhlps %xmm2, %xmm3 > movhlps %xmm1, %xmm4 > movq%rax, %xmm0 > movq%xmm4, %rdx > movq%xmm3, %rax > orq %rdx, %rax > movq%rax, %xmm5 > punpcklqdq %xmm5, %xmm0 > ret > > xor:movdqa %xmm0, %xmm2 > movq%xmm1, %rdx > movq%xmm0, %rax > xorq%rdx, %rax > movhlps %xmm2, %xmm3 > movhlps %xmm1, %xmm4 > movq%rax, %xmm0 > movq%xmm4, %rdx > movq%xmm3, %rax > xorq%rdx, %rax > movq%rax, %xmm5 > punpcklqdq %xmm5, %xmm0 > ret > > not:movdqa %xmm0, %xmm1 > movq%xmm0, %rax > notq%rax > movhlps %xmm1, %xmm2 > movq%rax, %xmm0 > movq%xmm2, %rax > notq%rax > movq%rax, %xmm3 > punpcklqdq %xmm3, %xmm0 > ret > > > with this patch we now generate the much more efficient: > > and:pand%xmm1, %xmm0 > ret > > ior:por %xmm1, %xmm0 > ret > > xor:pxor%xmm1, %xmm0 > ret > > not:pcmpeqd %xmm1, %xmm1 > pxor%xmm1, %xmm0 > ret > > > For my first few attempts at this patch I tried adding V1TI to the > existing VI and VI12_AVX_512F mode iterators, but these then have > dependencies on other iterators (and attributes), and so on until > everything ties itself into a knot, as V1TI mode isn't really a > first-class vector mode on x86_64. Hence I ultimately opted to use > simple stand-alone patterns (as used by the existing TF mode support). > > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" > and "make -k check" with no new failures. Ok for mainline? > > > 2021-10-22 Roger Sayle > > gcc/ChangeLog > * config/i386/sse.md (v1ti3): New define_insn to > implement V1TImode AND, IOR and XOR on TARGET_SSE2 (and above). > (one_cmplv1ti2): New define expand. > > gcc/testsuite/ChangeLog > * gcc.target/i386/sse2-v1ti-logic.c: New test case. > * gcc.target/i386/sse2-v1ti-logic-2.c: New test case. There is no need for /* { dg-require-effective-target sse2 } */ for compile tests. The compilation does not reach the assembler. OK with the above change. BTW: You can add testcases to the main patch with "git add " and then create the patch with "git diff HEAD". Thanks, Uros.
Re: [PATCH 4/6] aarch64: Add machine modes for Neon vector-tuple types
Thanks a lot for doing this. Jonathan Wright writes: > @@ -763,9 +839,16 @@ aarch64_lookup_simd_builtin_type (machine_mode mode, > return aarch64_simd_builtin_std_type (mode, q); > >for (i = 0; i < nelts; i++) > -if (aarch64_simd_types[i].mode == mode > - && aarch64_simd_types[i].q == q) > - return aarch64_simd_types[i].itype; > +{ > + if (aarch64_simd_types[i].mode == mode > + && aarch64_simd_types[i].q == q) > + return aarch64_simd_types[i].itype; > + else if (aarch64_simd_tuple_types[i][0] != NULL_TREE) Very minor (sorry for not noticing earlier), but: the “else” is redundant here. > + for (int j = 0; j < 3; j++) > + if (TYPE_MODE (aarch64_simd_tuple_types[i][j]) == mode > + && aarch64_simd_types[i].q == q) > + return aarch64_simd_tuple_types[i][j]; > +} > >return NULL_TREE; > } > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > 48eddf64e05afe3788abfa05141f6544a9323ea1..0aa185b67ff13d40c87db0449aec312929ff5387 > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -6636,162 +6636,165 @@ > > ;; Patterns for vector struct loads and stores. > > -(define_insn "aarch64_simd_ld2" > - [(set (match_operand:OI 0 "register_operand" "=w") > - (unspec:OI [(match_operand:OI 1 "aarch64_simd_struct_operand" "Utv") > - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > -UNSPEC_LD2))] > +(define_insn "aarch64_simd_ld2" > + [(set (match_operand:VSTRUCT_2Q 0 "register_operand" "=w") > + (unspec:VSTRUCT_2Q [ > + (match_operand:VSTRUCT_2Q 1 "aarch64_simd_struct_operand" "Utv")] > + UNSPEC_LD2))] >"TARGET_SIMD" >"ld2\\t{%S0. - %T0.}, %1" >[(set_attr "type" "neon_load2_2reg")] > ) > > -(define_insn "aarch64_simd_ld2r" > - [(set (match_operand:OI 0 "register_operand" "=w") > - (unspec:OI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") > - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] > - UNSPEC_LD2_DUP))] > +(define_insn "aarch64_simd_ld2r" > + [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w") > + (unspec:VSTRUCT_2QD [ > + (match_operand:VSTRUCT_2QD 1 "aarch64_simd_struct_operand" "Utv")] > + UNSPEC_LD2_DUP))] Sorry again for missing this, but the ld2rs, ld3rs and ld4rs should keep their BLKmode arguments, since they only access 2, 3 or 4 scalar memory elements. > @@ -7515,10 +7605,10 @@ > ) > > (define_insn_and_split "aarch64_combinev16qi" > - [(set (match_operand:OI 0 "register_operand" "=w") > - (unspec:OI [(match_operand:V16QI 1 "register_operand" "w") > - (match_operand:V16QI 2 "register_operand" "w")] > -UNSPEC_CONCAT))] > + [(set (match_operand:V2x16QI 0 "register_operand" "=w") > + (unspec:V2x16QI [(match_operand:V16QI 1 "register_operand" "w") > + (match_operand:V16QI 2 "register_operand" "w")] > + UNSPEC_CONCAT))] Just realised that we can now make this a vec_concat, since the modes are finally self-consistent. No need to do that though, either way is fine. Looks good otherwise. Richard
Re: [PATCH 5/6] gcc/lower_subreg.c: Prevent decomposition if modes are not tieable
Jonathan Wright writes: > Hi, > > Preventing decomposition if modes are not tieable is necessary to > stop AArch64 partial Neon structure modes being treated as packed in > registers. > > This is a necessary prerequisite for a future AArch64 PCS change to > maintain good code generation. > > Bootstrapped and regression tested on: > * x86_64-pc-linux-gnu - no issues. > * aarch64-none-linux-gnu - two test failures which will be fixed by > the next patch in this series. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-10-14 Jonathan Wright > > * lower-subreg.c (simple_move): Prevent decomposition if > modes are not tieable. OK as a single commit with 6/6. Thanks for splitting this out for review purposes. Richard > > diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c > index > 21078268ba0d241efc469fe71357d3b94f8935d6..f0dc63f485f1237d96ceeb0c75dca9aa8e053c6e > 100644 > --- a/gcc/lower-subreg.c > +++ b/gcc/lower-subreg.c > @@ -383,8 +383,10 @@ simple_move (rtx_insn *insn, bool speed_p) > non-integer mode for which there is no integer mode of the same > size. */ >mode = GET_MODE (SET_DEST (set)); > + scalar_int_mode int_mode; >if (!SCALAR_INT_MODE_P (mode) > - && !int_mode_for_size (GET_MODE_BITSIZE (mode), 0).exists ()) > + && (!int_mode_for_size (GET_MODE_BITSIZE (mode), 0).exists (&int_mode) > + || !targetm.modes_tieable_p (mode, int_mode))) > return NULL_RTX; > >/* Reject PARTIAL_INT modes. They are used for processor specific
Re: [PATCH 6/6] aarch64: Pass and return Neon vector-tuple types without a parallel
Jonathan Wright writes: > Hi, > > Neon vector-tuple types can be passed in registers on function call > and return - there is no need to generate a parallel rtx. This patch > adds cases to detect vector-tuple modes and generates an appropriate > register rtx. > > This change greatly improves code generated when passing Neon vector- > tuple types between functions; Indeed. > many new test cases are added to defend these improvements. > > Bootstrapped and regression tested on aarch64-none-linux-gnu and > aarch64_be-none-linux-gnu - no issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-10-07 Jonathan Wright > > * config/aarch64/aarch64.c (aarch64_function_value): Generate > a register rtx for Neon vector-tuple modes. > (aarch64_layout_arg): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vector_structure_intrinsics.c: New code > generation tests. OK, thanks. Richard > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > cbfcf7efcca8e0978518b69cbeafb6812c38889a..9c2b3cb7d677a1570b32a8c9b6ee14bef156cb45 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -6433,6 +6433,12 @@ aarch64_function_value (const_tree type, const_tree > func, > gcc_assert (count == 1 && mode == ag_mode); > return gen_rtx_REG (mode, V0_REGNUM); > } > + else if (aarch64_advsimd_full_struct_mode_p (mode) > +&& known_eq (GET_MODE_SIZE (ag_mode), 16)) > + return gen_rtx_REG (mode, V0_REGNUM); > + else if (aarch64_advsimd_partial_struct_mode_p (mode) > +&& known_eq (GET_MODE_SIZE (ag_mode), 8)) > + return gen_rtx_REG (mode, V0_REGNUM); >else > { > int i; > @@ -6728,6 +6734,12 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const > function_arg_info &arg) > gcc_assert (nregs == 1); > pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn); > } > + else if (aarch64_advsimd_full_struct_mode_p (mode) > +&& known_eq (GET_MODE_SIZE (pcum->aapcs_vfp_rmode), 16)) > + pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn); > + else if (aarch64_advsimd_partial_struct_mode_p (mode) > +&& known_eq (GET_MODE_SIZE (pcum->aapcs_vfp_rmode), 8)) > + pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn); > else > { > rtx par; > diff --git a/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c > b/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c > index > 89e9de18a92dbc00e58261e4558b3cff38c7ca75..100739ab4e67e27a7341b8b1a4ddd9494f0e181d > 100644 > --- a/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c > +++ b/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c > @@ -17,6 +17,14 @@ TEST_TBL (vqtbl2q, int8x16_t, int8x16x2_t, uint8x16_t, s8) > TEST_TBL (vqtbl2q, uint8x16_t, uint8x16x2_t, uint8x16_t, u8) > TEST_TBL (vqtbl2q, poly8x16_t, poly8x16x2_t, uint8x16_t, p8) > > +TEST_TBL (vqtbl3, int8x8_t, int8x16x3_t, uint8x8_t, s8) > +TEST_TBL (vqtbl3, uint8x8_t, uint8x16x3_t, uint8x8_t, u8) > +TEST_TBL (vqtbl3, poly8x8_t, poly8x16x3_t, uint8x8_t, p8) > + > +TEST_TBL (vqtbl3q, int8x16_t, int8x16x3_t, uint8x16_t, s8) > +TEST_TBL (vqtbl3q, uint8x16_t, uint8x16x3_t, uint8x16_t, u8) > +TEST_TBL (vqtbl3q, poly8x16_t, poly8x16x3_t, uint8x16_t, p8) > + > TEST_TBL (vqtbl4, int8x8_t, int8x16x4_t, uint8x8_t, s8) > TEST_TBL (vqtbl4, uint8x8_t, uint8x16x4_t, uint8x8_t, u8) > TEST_TBL (vqtbl4, poly8x8_t, poly8x16x4_t, uint8x8_t, p8) > @@ -25,62 +33,35 @@ TEST_TBL (vqtbl4q, int8x16_t, int8x16x4_t, uint8x16_t, s8) > TEST_TBL (vqtbl4q, uint8x16_t, uint8x16x4_t, uint8x16_t, u8) > TEST_TBL (vqtbl4q, poly8x16_t, poly8x16x4_t, uint8x16_t, p8) > > -#define TEST_TBL3(name, rettype, tbltype, idxtype, ts) \ > - rettype test_ ## name ## _ ## ts (idxtype a, tbltype b) \ > - { \ > - return name ## _ ## ts (b, a); \ > - } > - > -TEST_TBL3 (vqtbl3, int8x8_t, int8x16x3_t, uint8x8_t, s8) > -TEST_TBL3 (vqtbl3, uint8x8_t, uint8x16x3_t, uint8x8_t, u8) > -TEST_TBL3 (vqtbl3, poly8x8_t, poly8x16x3_t, uint8x8_t, p8) > - > -TEST_TBL3 (vqtbl3q, int8x16_t, int8x16x3_t, uint8x16_t, s8) > -TEST_TBL3 (vqtbl3q, uint8x16_t, uint8x16x3_t, uint8x16_t, u8) > -TEST_TBL3 (vqtbl3q, poly8x16_t, poly8x16x3_t, uint8x16_t, p8) > - > -#define TEST_TBX2(name, rettype, tbltype, idxtype, ts) \ > - rettype test_ ## name ## _ ## ts (rettype a, idxtype b, tbltype c) \ > - { \ > - return name ## _ ## ts (a, c, b); \ > - } > - > -TEST_TBX2 (vqtbx2, int8x8_t, int8x16x2_t, uint8x8_t, s8) > -TEST_TBX2 (vqtbx2, uint8x8_t, uint8x16x2_t, uint8x8_t, u8) > -TEST_TBX2 (vqtbx2, poly8x8_t, poly8x16x2_t, uint8x8_t, p8) > - > -TEST_TBX2 (vqtbx2q, int8x16_t, int8x16x2_t, uint8x16_t, s8) > -TEST_TBX2 (vqtbx2q, uint8x16_t, uint8x16x2_t, uint8x16_t, u8) > -TEST_TBX2 (vqtbx2q,
Re: [PATCH] Try to resolve paths in threader without looking further back.
On Fri, Oct 22, 2021 at 4:27 PM Martin Sebor wrote: > > On 10/22/21 5:22 AM, Aldy Hernandez wrote: > > On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor wrote: > > > >> I'd like to see gimple-ssa-array-bounds invoked from the access > >> pass too (instead of from VRP), and eventually -Wrestrict as well. > > > > You can do that right now. The pass has been converted to the new API > > and it would just require calling it with a ranger instead of the > > vr_values from VRP: > > > >array_bounds_checker array_checker (fun, &vrp_vr_values); > >array_checker.check (); > > > > That is, move it where you want and pass it a fresh new gimple_ranger. > > If there are any regressions, we'd be glad to look at them. > > I appreciate that and I'm not worried about regressions due to > ranger code. > > It's not so simple as it seems because of the optimizer > dependencies I mentioned. -Warray-bounds runs before vectorization > and the access pass after it. Moving it into the access pass will > mean dealing with the fallout: either accepting regressions in > the quality of warnings (bad locations due to vectorization > merging distinct stores into one) or running the access pass at > a different point in the pipeline, and facing regressions in > the other warnings due to that. Running it twice, once earlier > for -Warray-bounds and then again later for -Wstringop-overflow > etc, would be less than optimal because they all do the same > thing (compute object sizes and offsets) and should be able to > share the same data (the pointer query cache). So the ideal > solution is to find a middle ground where all these warnings > can run from the same pass with optimal results. > > -Warray-bounds might also need to be adjusted for -O0 to avoid > warning on unreachable code, although, surprisingly, that hasn't > been an issue for the other warnings now enabled at -O0. > > All this will take some time, which I'm out of for this stage 1. > > > > >> I'm not sure about the strlen/sprintf warnings; those might need > >> to stay where they are because they run as part of the optimizers > >> there. > >> > >> (By the way, I don't see range info in the access pass at -O0. > >> Should I?) > > > > I assume you mean you don't see anything in the dump files. > > I mean that I don't get accurate range info from the ranger > instance in any function. I'd like the example below to trigger > a warning even at -O0 but it doesn't because n's range is > [0, UINT_MAX] instead of [7, UINT_MAX]: > >char a[4]; > >void f (unsigned n) >{ > if (n < 7) >n = 7; > __builtin_memset (a, 0, n); >} Breakpoint 5, get_size_range (query=0x0, bound=, range=0x7fffda10, bndrng=0x7fffdc98) at /home/aldyh/src/gcc/gcc/gimple-ssa-warn-access.cc:1196 (gdb) p debug_ranger() ;; Function f === BB 2 Imports: n_3(D) Exports: n_3(D) n_3(D)unsigned int VARYING : if (n_3(D) <= 6) goto ; [INV] else goto ; [INV] 2->3 (T) n_3(D) : unsigned int [0, 6] 2->4 (F) n_3(D) : unsigned int [7, +INF] === BB 3 : n_4 = 7; n_4 : unsigned int [7, 7] === BB 4 : # n_2 = PHI _1 = (long unsigned int) n_2; __builtin_memset (&a, 0, _1); return; _1 : long unsigned int [7, 4294967295] n_2 : unsigned int [7, +INF] Non-varying global ranges: =: _1 : long unsigned int [7, 4294967295] n_2 : unsigned int [7, +INF] n_4 : unsigned int [7, 7] >From the above it looks like _1 at BB4 is [7, 4294967295]. You probably wan: range_of_expr (r, tree_for_ssa_1, gimple_for_the_memset_call) BTW, debug_ranger() tells you everything ranger would know for the given IL. It's meant as a debugging aid. You may want to look at it's source to see how it calls the ranger. Aldy
Re: [PATCH][WIP] Add install-dvi Makefile targets
On 10/18/2021 7:30 PM, Eric Gallager wrote: On Tue, Oct 12, 2021 at 5:09 PM Eric Gallager wrote: On Thu, Oct 6, 2016 at 10:41 AM Eric Gallager wrote: Currently the build machinery handles install-pdf and install-html targets, but no install-dvi target. This patch is a step towards fixing that. Note that I have only tested with --enable-languages=c,c++,lto,objc,obj-c++. Thus, target hooks will probably also have to be added for the languages I skipped. Also, please note that this patch applies on top of: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00370.html ChangeLog: 2016-10-06 Eric Gallager * Makefile.def: Handle install-dvi target. * Makefile.tpl: Likewise. * Makefile.in: Regenerate. gcc/ChangeLog: 2016-10-06 Eric Gallager * Makefile.in: Handle dvidir and install-dvi target. * ./[c|cp|lto|objc|objcp]/Make-lang.in: Add dummy install-dvi target hooks. * configure.ac: Handle install-dvi target. * configure: Regenerate. libiberty/ChangeLog: 2016-10-06 Eric Gallager * Makefile.in: Handle dvidir and install-dvi target. * functions.texi: Regenerate. Ping. The prerequisite patch that I linked to previously has gone in now. I'm not sure if this specific patch still applies, though. Also note that I've opened a bug to track this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102663 Hi, I have updated this patch and tested it with more languages now; I can now confirm that it works with ada, d, and fortran now. The only languages that remain untested now are go (since I'm building on darwin and go doesn't build on darwin anyways, as per bug 46986) and jit (which I ran into a bug about that I brought up on IRC, and will probably need to file on bugzilla). OK to install? Yea, I think this is OK. We might need to adjust go/jit and perhaps other toplevel modules, but if those do show up as problems I think we can fault in those fixes. jeff
Re: [PATCH 2/3][vect] Consider outside costs earlier for epilogue loops
"Andre Vieira (lists) via Gcc-patches" writes: > Hi, > > This patch changes the order in which we check outside and inside costs > for epilogue loops, this is to ensure that a predicated epilogue is more > likely to be picked over an unpredicated one, since it saves having to > enter a scalar epilogue loop. > > gcc/ChangeLog: > > * tree-vect-loop.c (vect_better_loop_vinfo_p): Change how > epilogue loop costs are compared. OK, thanks. Sorry for the slow review. Richard > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index > 14f8150d7c262b9422784e0e997ca4387664a20a..038af13a91d43c9f09186d042cf415020ea73a38 > 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -2881,17 +2881,75 @@ vect_better_loop_vinfo_p (loop_vec_info > new_loop_vinfo, > return new_simdlen_p; > } > > + loop_vec_info main_loop = LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo); > + if (main_loop) > +{ > + poly_uint64 main_poly_vf = LOOP_VINFO_VECT_FACTOR (main_loop); > + unsigned HOST_WIDE_INT main_vf; > + unsigned HOST_WIDE_INT old_factor, new_factor, old_cost, new_cost; > + /* If we can determine how many iterations are left for the epilogue > + loop, that is if both the main loop's vectorization factor and number > + of iterations are constant, then we use them to calculate the cost of > + the epilogue loop together with a 'likely value' for the epilogues > + vectorization factor. Otherwise we use the main loop's vectorization > + factor and the maximum poly value for the epilogue's. If the target > + has not provided with a sensible upper bound poly vectorization > + factors are likely to be favored over constant ones. */ > + if (main_poly_vf.is_constant (&main_vf) > + && LOOP_VINFO_NITERS_KNOWN_P (main_loop)) > + { > + unsigned HOST_WIDE_INT niters > + = LOOP_VINFO_INT_NITERS (main_loop) % main_vf; > + HOST_WIDE_INT old_likely_vf > + = estimated_poly_value (old_vf, POLY_VALUE_LIKELY); > + HOST_WIDE_INT new_likely_vf > + = estimated_poly_value (new_vf, POLY_VALUE_LIKELY); > + > + /* If the epilogue is using partial vectors we account for the > + partial iteration here too. */ > + old_factor = niters / old_likely_vf; > + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo) > + && niters % old_likely_vf != 0) > + old_factor++; > + > + new_factor = niters / new_likely_vf; > + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo) > + && niters % new_likely_vf != 0) > + new_factor++; > + } > + else > + { > + unsigned HOST_WIDE_INT main_vf_max > + = estimated_poly_value (main_poly_vf, POLY_VALUE_MAX); > + > + old_factor = main_vf_max / estimated_poly_value (old_vf, > +POLY_VALUE_MAX); > + new_factor = main_vf_max / estimated_poly_value (new_vf, > +POLY_VALUE_MAX); > + > + /* If the loop is not using partial vectors then it will iterate one > + time less than one that does. It is safe to subtract one here, > + because the main loop's vf is always at least 2x bigger than that > + of an epilogue. */ > + if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo)) > + old_factor -= 1; > + if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo)) > + new_factor -= 1; > + } > + > + /* Compute the costs by multiplying the inside costs with the factor > and > + add the outside costs for a more complete picture. The factor is the > + amount of times we are expecting to iterate this epilogue. */ > + old_cost = old_loop_vinfo->vec_inside_cost * old_factor; > + new_cost = new_loop_vinfo->vec_inside_cost * new_factor; > + old_cost += old_loop_vinfo->vec_outside_cost; > + new_cost += new_loop_vinfo->vec_outside_cost; > + return new_cost < old_cost; > +} > + >/* Limit the VFs to what is likely to be the maximum number of iterations, > to handle cases in which at least one loop_vinfo is fully-masked. */ > - HOST_WIDE_INT estimated_max_niter; > - loop_vec_info main_loop = LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo); > - unsigned HOST_WIDE_INT main_vf; > - if (main_loop > - && LOOP_VINFO_NITERS_KNOWN_P (main_loop) > - && LOOP_VINFO_VECT_FACTOR (main_loop).is_constant (&main_vf)) > -estimated_max_niter = LOOP_VINFO_INT_NITERS (main_loop) % main_vf; > - else > -estimated_max_niter = likely_max_stmt_executions_int (loop); > + HOST_WIDE_INT estimated_max_niter = likely_max_stmt_executions_int (loop); >if (estimated_max_niter != -1) > { >if (known_le (estimated_max_niter, new_vf))
Re: [PATCH] Try to resolve paths in threader without looking further back.
On 10/22/21 9:18 AM, Aldy Hernandez wrote: On Fri, Oct 22, 2021 at 4:27 PM Martin Sebor wrote: On 10/22/21 5:22 AM, Aldy Hernandez wrote: On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor wrote: I'd like to see gimple-ssa-array-bounds invoked from the access pass too (instead of from VRP), and eventually -Wrestrict as well. You can do that right now. The pass has been converted to the new API and it would just require calling it with a ranger instead of the vr_values from VRP: array_bounds_checker array_checker (fun, &vrp_vr_values); array_checker.check (); That is, move it where you want and pass it a fresh new gimple_ranger. If there are any regressions, we'd be glad to look at them. I appreciate that and I'm not worried about regressions due to ranger code. It's not so simple as it seems because of the optimizer dependencies I mentioned. -Warray-bounds runs before vectorization and the access pass after it. Moving it into the access pass will mean dealing with the fallout: either accepting regressions in the quality of warnings (bad locations due to vectorization merging distinct stores into one) or running the access pass at a different point in the pipeline, and facing regressions in the other warnings due to that. Running it twice, once earlier for -Warray-bounds and then again later for -Wstringop-overflow etc, would be less than optimal because they all do the same thing (compute object sizes and offsets) and should be able to share the same data (the pointer query cache). So the ideal solution is to find a middle ground where all these warnings can run from the same pass with optimal results. -Warray-bounds might also need to be adjusted for -O0 to avoid warning on unreachable code, although, surprisingly, that hasn't been an issue for the other warnings now enabled at -O0. All this will take some time, which I'm out of for this stage 1. I'm not sure about the strlen/sprintf warnings; those might need to stay where they are because they run as part of the optimizers there. (By the way, I don't see range info in the access pass at -O0. Should I?) I assume you mean you don't see anything in the dump files. I mean that I don't get accurate range info from the ranger instance in any function. I'd like the example below to trigger a warning even at -O0 but it doesn't because n's range is [0, UINT_MAX] instead of [7, UINT_MAX]: char a[4]; void f (unsigned n) { if (n < 7) n = 7; __builtin_memset (a, 0, n); } Breakpoint 5, get_size_range (query=0x0, bound=, range=0x7fffda10, bndrng=0x7fffdc98) at /home/aldyh/src/gcc/gcc/gimple-ssa-warn-access.cc:1196 (gdb) p debug_ranger() ;; Function f === BB 2 Imports: n_3(D) Exports: n_3(D) n_3(D)unsigned int VARYING : if (n_3(D) <= 6) goto ; [INV] else goto ; [INV] 2->3 (T) n_3(D) : unsigned int [0, 6] 2->4 (F) n_3(D) : unsigned int [7, +INF] === BB 3 : n_4 = 7; n_4 : unsigned int [7, 7] === BB 4 : # n_2 = PHI _1 = (long unsigned int) n_2; __builtin_memset (&a, 0, _1); return; _1 : long unsigned int [7, 4294967295] n_2 : unsigned int [7, +INF] Non-varying global ranges: =: _1 : long unsigned int [7, 4294967295] n_2 : unsigned int [7, +INF] n_4 : unsigned int [7, 7] From the above it looks like _1 at BB4 is [7, 4294967295]. Great! You probably wan: range_of_expr (r, tree_for_ssa_1, gimple_for_the_memset_call) That's what the function does. But its caller doesn't have access to the Gimple statement so it passes in null instead. Presumably without it, range_of_expr() doesn't have enough context to know what BB I'm asking about. It does work without the statement at -O but then there's just one BB (the if statement becomes a MAX_EXPR) so there's just one range. BTW, debug_ranger() tells you everything ranger would know for the given IL. It's meant as a debugging aid. You may want to look at it's source to see how it calls the ranger. Thanks for the tip. I should do that. There's a paradigm shift from the old ways of working with ranges and the new way, and it will take a bit of adjusting to. I just haven't spent enough time working with Ranger to be there. But this exchange alone was already very helpful! Martin
[PATCH] rs6000: Add optimizations for _mm_sad_epu8
Power9 ISA added `vabsdub` instruction which is realized in the `vec_absd` instrinsic. Use `vec_absd` for `_mm_sad_epu8` compatibility intrinsic, when `_ARCH_PWR9`. Also, the realization of `vec_sum2s` on little-endian includes two shifts in order to position the input and output to match the semantics of `vec_sum2s`: - Shift the second input vector left 12 bytes. In the current usage, that vector is `{0}`, so this shift is unnecessary, but is currently not eliminated under optimization. - Shift the vector produced by the `vsum2sws` instruction left 4 bytes. The two words within each doubleword of this (shifted) result must then be explicitly swapped to match the semantics of `_mm_sad_epu8`, effectively reversing this shift. So, this shift (and a susequent swap) are unnecessary, but not currently removed under optimization. Using `__builtin_altivec_vsum2sws` retains both shifts, so is not an option for removing the shifts. For little-endian, use the `vsum2sws` instruction directly, and eliminate the explicit shift (swap). 2021-10-22 Paul A. Clarke gcc * config/rs6000/emmintrin.h (_mm_sad_epu8): Use vec_absd when _ARCH_PWR9, optimize vec_sum2s when LE. --- Tested on powerpc64le-linux on Power9, with and without `-mcpu=power9`, and on powerpc/powerpc64-linux on Power8. OK for trunk? gcc/config/rs6000/emmintrin.h | 24 +--- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h index ab16c13c379e..c4758be0e777 100644 --- a/gcc/config/rs6000/emmintrin.h +++ b/gcc/config/rs6000/emmintrin.h @@ -2197,27 +2197,37 @@ extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __arti _mm_sad_epu8 (__m128i __A, __m128i __B) { __v16qu a, b; - __v16qu vmin, vmax, vabsdiff; + __v16qu vabsdiff; __v4si vsum; const __v4su zero = { 0, 0, 0, 0 }; __v4si result; a = (__v16qu) __A; b = (__v16qu) __B; - vmin = vec_min (a, b); - vmax = vec_max (a, b); +#ifndef _ARCH_PWR9 + __v16qu vmin = vec_min (a, b); + __v16qu vmax = vec_max (a, b); vabsdiff = vec_sub (vmax, vmin); +#else + vabsdiff = vec_absd (a, b); +#endif /* Sum four groups of bytes into integers. */ vsum = (__vector signed int) vec_sum4s (vabsdiff, zero); +#ifdef __LITTLE_ENDIAN__ + /* Sum across four integers with two integer results. */ + asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero)); + /* Note: vec_sum2s could be used here, but on little-endian, vector + shifts are added that are not needed for this use-case. + A vector shift to correctly position the 32-bit integer results + (currently at [0] and [2]) to [1] and [3] would then need to be + swapped back again since the desired results are two 64-bit + integers ([1]|[0] and [3]|[2]). Thus, no shift is performed. */ +#else /* Sum across four integers with two integer results. */ result = vec_sum2s (vsum, (__vector signed int) zero); /* Rotate the sums into the correct position. */ -#ifdef __LITTLE_ENDIAN__ - result = vec_sld (result, result, 4); -#else result = vec_sld (result, result, 6); #endif - /* Rotate the sums into the correct position. */ return (__m128i) result; } -- 2.27.0
Re: [PATCH] Handle jobserver file descriptors in btest.
On Fri, Oct 22, 2021 at 1:15 AM Martin Liška wrote: > > On 10/21/21 20:15, Ian Lance Taylor wrote: > > On Thu, Oct 21, 2021 at 12:48 AM Martin Liška wrote: > >> > >> The patch is about sensitive handling of file descriptors opened > >> by make's jobserver. > > > > Thanks. I think a better approach would be, at the start of main, > > fstat the descriptors up to 10 and record the ones for which fstat > > succeeds. Then at the end of main only check the descriptors for > > which fstat failed earlier. > > Sure, makes sense. > > > > > I can work on that at some point if you don't want to tackle it. > > I've just done that in the attached patch. > > Is it fine? This is OK. Thanks. Ian
[PATCH] sra: Fix the fix for PR 102505 (PR 102886)
Hi, I was not careful with the fix for PR 102505 and did not craft the check to satisfy the verifier carefully, which lead to PR 102886. (The verifier has the test structured differently and somewhat redundantly, so I could not just copy it). This patch fixes it. I hope it is quite obvious correction of an oversight and so will commit it if survives bootstrap and testing on x86_64-linux and ppc64le-linux. Testcase for this bug is gcc.dg/tree-ssa/sra-18.c (but only on platforms with constant pools). I will backport the two fixes to the release branches squashed. Sorry for the stupid mistake, Martin gcc/ChangeLog: 2021-10-22 Martin Jambor PR tree-optimization/102886 * tree-sra.c (totally_scalarize_subtree): Fix the out of access-condition. --- gcc/tree-sra.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index f561e1a2133..76e3aae405c 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -3288,7 +3288,7 @@ totally_scalarize_subtree (struct access *root) continue; HOST_WIDE_INT pos = root->offset + int_bit_position (fld); - if (pos + fsize > root->size) + if (pos + fsize > root->offset + root->size) return false; enum total_sra_field_state state = total_should_skip_creating_access (root, -- 2.33.0
[Fortran, committed] Add testcase for PR 94289
I've committed this slightly cleaned-up version of the testcase originally submitted with the now-fixed issue PR 94289. -Sandra commit c31d2d14f798dc7ca9cc078200d37113749ec3bd Author: Sandra Loosemore Date: Fri Oct 22 11:08:19 2021 -0700 Add testcase for PR fortran/94289 2021-10-22 José Rui Faustino de Sousa Sandra Loosemore gcc/testsuite/ PR fortran/94289 * gfortran.dg/PR94289.f90: New. diff --git a/gcc/testsuite/gfortran.dg/PR94289.f90 b/gcc/testsuite/gfortran.dg/PR94289.f90 new file mode 100644 index 000..4f17d97 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/PR94289.f90 @@ -0,0 +1,168 @@ +! { dg-do run } +! +! Testcase for PR 94289 +! +! - if the dummy argument is a pointer/allocatable, it has the same +! bounds as the dummy argument +! - if is is nonallocatable nonpointer, the lower bounds are [1, 1, 1]. + +module bounds_m + + implicit none + + private + public :: & +lb, ub + + public :: & +bnds_p, & +bnds_a, & +bnds_e + + integer, parameter :: lb1 = 3 + integer, parameter :: lb2 = 5 + integer, parameter :: lb3 = 9 + integer, parameter :: ub1 = 4 + integer, parameter :: ub2 = 50 + integer, parameter :: ub3 = 11 + integer, parameter :: ex1 = ub1 - lb1 + 1 + integer, parameter :: ex2 = ub2 - lb2 + 1 + integer, parameter :: ex3 = ub3 - lb3 + 1 + + integer, parameter :: lf(*) = [1,1,1] + integer, parameter :: lb(*) = [lb1,lb2,lb3] + integer, parameter :: ub(*) = [ub1,ub2,ub3] + integer, parameter :: ex(*) = [ex1,ex2,ex3] + +contains + + subroutine bounds(a, lb, ub) +integer, pointer, intent(in) :: a(..) +integer, intent(in) :: lb(3) +integer, intent(in) :: ub(3) + +integer :: ex(3) + +ex = max(ub-lb+1, 0) +if(any(lbound(a)/=lb)) stop 101 +if(any(ubound(a)/=ub)) stop 102 +if(any( shape(a)/=ex)) stop 103 +return + end subroutine bounds + + subroutine bnds_p(this) +integer, pointer, intent(in) :: this(..) + +if(any(lbound(this)/=lb)) stop 1 +if(any(ubound(this)/=ub)) stop 2 +if(any( shape(this)/=ex)) stop 3 +call bounds(this, lb, ub) +return + end subroutine bnds_p + + subroutine bnds_a(this) +integer, allocatable, target, intent(in) :: this(..) + +if(any(lbound(this)/=lb)) stop 4 +if(any(ubound(this)/=ub)) stop 5 +if(any( shape(this)/=ex)) stop 6 +call bounds(this, lb, ub) +return + end subroutine bnds_a + + subroutine bnds_e(this) +integer, target, intent(in) :: this(..) + +if(any(lbound(this)/=lf)) stop 7 +if(any(ubound(this)/=ex)) stop 8 +if(any( shape(this)/=ex)) stop 9 +call bounds(this, lf, ex) +return + end subroutine bnds_e + +end module bounds_m + +program bounds_p + + use, intrinsic :: iso_c_binding, only: c_int + + use bounds_m + + implicit none + + integer, parameter :: fpn = 1 + integer, parameter :: fan = 2 + integer, parameter :: fon = 3 + + integer :: i + + do i = fpn, fon +call test_p(i) + end do + do i = fpn, fon +call test_a(i) + end do + do i = fpn, fon +call test_e(i) + end do + stop + +contains + + subroutine test_p(t) +integer, intent(in) :: t + +integer, pointer :: a(:,:,:) + +allocate(a(lb(1):ub(1),lb(2):ub(2),lb(3):ub(3))) +select case(t) +case(fpn) + call bnds_p(a) +case(fan) +case(fon) + call bnds_e(a) +case default + stop +end select +deallocate(a) +return + end subroutine test_p + + subroutine test_a(t) +integer, intent(in) :: t + +integer, allocatable, target :: a(:,:,:) + +allocate(a(lb(1):ub(1),lb(2):ub(2),lb(3):ub(3))) +select case(t) +case(fpn) + call bnds_p(a) +case(fan) + call bnds_a(a) +case(fon) + call bnds_e(a) +case default + stop +end select +deallocate(a) +return + end subroutine test_a + + subroutine test_e(t) +integer, intent(in) :: t + +integer, target :: a(lb(1):ub(1),lb(2):ub(2),lb(3):ub(3)) + +select case(t) +case(fpn) + call bnds_p(a) +case(fan) +case(fon) + call bnds_e(a) +case default + stop +end select +return + end subroutine test_e + +end program bounds_p
[PATCH] PR fortran/102816 - [12 Regression] ICE in resolve_structure_cons, at fortran/resolve.c:1467
Dear Fortranners, the recently introduced shape validation for array components in DT constructors did not properly deal with invalid code created by ingenious testers. Obvious solution: replace the gcc_assert by a suitable error message. Regarding the error message: before the shape validation, gfortran would emit the same error message twice referring to the same line, namely the bad declaration of the component. With the attached patch we get one error message for the bad declaration of the component, and one for the structure constructor referring to that DT component. One could easily change that and make the second message refer to the same as the declaration, giving two errors for the same line. Comments / opinions? Regtested on x86_64-pc-linux-gnu. OK? Thanks, Harald Fortran: error recovery on initializing invalid derived type array component gcc/fortran/ChangeLog: PR fortran/102816 * resolve.c (resolve_structure_cons): Reject invalid array spec of a DT component referred in a structure constructor. gcc/testsuite/ChangeLog: PR fortran/102816 * gfortran.dg/pr102816.f90: New test. diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index 5ccd9072c24..dc4ca5ef818 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -1463,8 +1463,15 @@ resolve_structure_cons (gfc_expr *expr, int init) mpz_init (len); for (int n = 0; n < rank; n++) { - gcc_assert (comp->as->upper[n]->expr_type == EXPR_CONSTANT - && comp->as->lower[n]->expr_type == EXPR_CONSTANT); + if (comp->as->upper[n]->expr_type != EXPR_CONSTANT + || comp->as->lower[n]->expr_type != EXPR_CONSTANT) + { + gfc_error ("Bad array spec of component %qs referred in " + "structure constructor at %L", + comp->name, &cons->expr->where); + t = false; + break; + }; mpz_set_ui (len, 1); mpz_add (len, len, comp->as->upper[n]->value.integer); mpz_sub (len, len, comp->as->lower[n]->value.integer); diff --git a/gcc/testsuite/gfortran.dg/pr102816.f90 b/gcc/testsuite/gfortran.dg/pr102816.f90 new file mode 100644 index 000..46831743b2b --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr102816.f90 @@ -0,0 +1,9 @@ +! { dg-do compile } +! PR fortran/102816 + +program p + type t + integer :: a([2]) ! { dg-error "must be scalar" } + end type + type(t) :: x = t([3, 4]) ! { dg-error "Bad array spec of component" } +end
Re: José's pending bind(C) patches / status (was: Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid)
Hi Tobias, Am 22.10.21 um 15:06 schrieb Tobias Burnus: https://gcc.gnu.org/pipermail/fortran/2021-April/055934.html PR100103 - Automatic reallocation fails inside select rank Still segfaults at runtime for 'that = a' where the RHS is a parameter and the LHS an allocatable assumed-rank array (inside select rank). TODO: Review patch this one LGTM. Thanks for the patch! Harald
Re: José's pending bind(C) patches / status (was: Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid)
Hi Tobias, José, Am 22.10.21 um 15:06 schrieb Tobias Burnus: https://gcc.gnu.org/pipermail/fortran/2021-April/055949.html PR100136 - ICE, regression, using flag -fcheck=pointer First testcase has an ICE with -fcheck=pointer Second testcase has always an ICE + possibly missing func. TODO: Review patch – and probably: follow-up patch for remaining issue I think this LGTM. Thanks for the patch! Harald
Re: José's pending bind(C) patches / status (was: Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid)
Hi Tobias, José, Am 22.10.21 um 15:06 schrieb Tobias Burnus: https://gcc.gnu.org/pipermail/fortran/2021-April/055982.html PR100245 - ICE on automatic reallocation. Still ICEs TODO: Review patch. this one works and LGTM. Thanks for the patch! Harald
Re: [PATCH] Port update-copyright.py to Python3
Hi! On 2021-01-04T11:15:22+0100, Martin Liška wrote: > The patch ports the script to Python3. Turns out, there is another issue, observed in combination with a few "BadYear" occurrences due to "improper" copyright lines (Bill, for your information). OK to push "Fix 'contrib/update-copyright.py': 'TypeError: exceptions must derive from BaseException'" as well as "Fix 'Copyright (C) 2020-21' into '2020-2021'", see attached? Grüße Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 >From 3cffeead3b7f900999ec7885ae044e63e44deff3 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Fri, 22 Oct 2021 15:54:42 +0200 Subject: [PATCH] Fix 'contrib/update-copyright.py': 'TypeError: exceptions must derive from BaseException' Running 'contrib/update-copyright.py' currently fails: [...] Traceback (most recent call last): File "contrib/update-copyright.py", line 365, in update_copyright canon_form = self.canonicalise_years (dir, filename, filter, years) File "contrib/update-copyright.py", line 270, in canonicalise_years (min_year, max_year) = self.year_range (years) File "contrib/update-copyright.py", line 253, in year_range year_list = [self.parse_year (year) File "contrib/update-copyright.py", line 253, in year_list = [self.parse_year (year) File "contrib/update-copyright.py", line 250, in parse_year raise self.BadYear (string) TypeError: exceptions must derive from BaseException During handling of the above exception, another exception occurred: Traceback (most recent call last): File "contrib/update-copyright.py", line 796, in GCCCmdLine().main() File "contrib/update-copyright.py", line 527, in main self.copyright.process_tree (dir, filter) File "contrib/update-copyright.py", line 458, in process_tree self.process_file (dir, filename, filter) File "contrib/update-copyright.py", line 421, in process_file res = self.update_copyright (dir, filename, filter, File "contrib/update-copyright.py", line 366, in update_copyright except self.BadYear as e: TypeError: catching classes that do not inherit from BaseException is not allowed Fix up for commit 3b25e83536bcd1b2977659a2c6d9f0f9bf2a3152 "Port update-copyright.py to Python3". contrib/ * update-copyright.py (class BadYear): Derive from 'Exception'. --- contrib/update-copyright.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/contrib/update-copyright.py b/contrib/update-copyright.py index 2b2bb11d2e6..d13b963a147 100755 --- a/contrib/update-copyright.py +++ b/contrib/update-copyright.py @@ -1,6 +1,6 @@ #!/usr/bin/env python3 # -# Copyright (C) 2013-2020 Free Software Foundation, Inc. +# Copyright (C) 2013-2021 Free Software Foundation, Inc. # # This script is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by @@ -233,7 +233,7 @@ class Copyright: def add_external_author (self, holder): self.holders[holder] = None -class BadYear(): +class BadYear (Exception): def __init__ (self, year): self.year = year -- 2.33.0 >From 881f3e7701ab7ae5269db72cb33a7879b7e94e09 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Fri, 22 Oct 2021 16:01:54 +0200 Subject: [PATCH] Fix 'Copyright (C) 2020-21' into '2020-2021' 'contrib/update-copyright.py' currently complains: gcc/config/rs6000/rs6000-gen-builtins.c: unrecognised year: 21 gcc/config/rs6000/rs6000-overload.def: unrecognised year: 21 gcc/config/rs6000/rbtree.h: unrecognised year: 21 gcc/config/rs6000/rbtree.c: unrecognised year: 21 gcc/config/rs6000/rs6000-builtin-new.def: unrecognised year: 21 Fix up files added in commit fa5f8b49e55caf5bb341f5eb6b5ab828b9286425 "rs6000: Red-black tree implementation for balanced tree search", commit 4a720a9547320699aceda7d2e0b08de5ab40132f "rs6000: Add initial input files", commit bd5b625228d545d5ecb35df24f9f094edc95e3fa "rs6000: Initial create of rs6000-gen-builtins.c". gcc/ * config/rs6000/rbtree.c: Fix 'Copyright (C) 2020-21' into '2020-2021' * config/rs6000/rbtree.h: Likewise. * config/rs6000/rs6000-builtin-new.def: Likewise. * config/rs6000/rs6000-gen-builtins.c: Likewise. * config/rs6000/rs6000-overload.def: Likewise. --- gcc/config/rs6000/rbtree.c | 2 +- gcc/config/rs6000/rbtree.h | 2 +- gcc/config/rs6000/rs6000-builtin-new.def | 2 +- gcc/config/rs6000/rs6000-gen-builtins.c | 2 +- gcc/config/rs6000/rs6000-overload.def| 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/config/rs6000/rbtree.c b/gcc/config/rs6000/rbtree.c index 37a559c1fbc..d3d03a62433 100644
[committed] Fortran: Avoid running into assert with -fcheck= + UBSAN [PR92621]
The testcase of the PR or as attached gave an ICE, but only when compiled with -fcheck=all -fsanitize=undefined. Solution: Strip the nop to avoid the assert failure. Committed as r12-4632-g24e99e6ec1cc57f3660c00ff677c7feb16aa94d2 Tobias * * * PS: Similar issues when using additional flags: ICE also with -fcheck=all -fsanitize=undefined: https://gcc.gnu.org/PR102901 ICE (segfault) when compiling pdt_13.f03 with -fcheck=all in gfc_check_pdt_dummy -> structure_alloc_comps https://gcc.gnu.org/PR102900 ICE via gfc_class_data_get with alloc_comp_class_4.f03 or proc_ptr_52.f90 using -fcheck=all + runtime same flags but running the code: https://gcc.gnu.org/PR102903 New: Invalid gfortran.dg testcases or wrong-code with -fcheck=all -fsanitize=undefined Here, false positives might/do surely exist as do testcase bugs. (And the list is incomplete.) + -flto fail (not really fitting into this series): https://gcc.gnu.org/PR102885 - [12 Regression] ICE when compiling gfortran.dg/bind_c_char_10.f90 with -flto since r12-4467-g64f9623765da3306 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 24e99e6ec1cc57f3660c00ff677c7feb16aa94d2 Author: Tobias Burnus Date: Fri Oct 22 23:23:06 2021 +0200 Fortran: Avoid running into assert with -fcheck= + UBSAN PR fortran/92621 gcc/fortran/ * trans-expr.c (gfc_trans_assignment_1): Add STRIP_NOPS. gcc/testsuite/ * gfortran.dg/bind-c-intent-out-2.f90: New test. diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 29697e69e75..2d7f9e0fb91 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -11727,6 +11727,7 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag, tmp = INDIRECT_REF_P (lse.expr) ? gfc_build_addr_expr (NULL_TREE, lse.expr) : lse.expr; + STRIP_NOPS (tmp); /* We should only get array references here. */ gcc_assert (TREE_CODE (tmp) == POINTER_PLUS_EXPR diff --git a/gcc/testsuite/gfortran.dg/bind-c-intent-out-2.f90 b/gcc/testsuite/gfortran.dg/bind-c-intent-out-2.f90 new file mode 100644 index 000..fe8f6060f1f --- /dev/null +++ b/gcc/testsuite/gfortran.dg/bind-c-intent-out-2.f90 @@ -0,0 +1,39 @@ +! { dg-do run } +! { dg-additional-options "-fsanitize=undefined -fcheck=all" } + +! PR fortran/92621 + +subroutine hello(val) bind(c) + use, intrinsic :: iso_c_binding, only: c_int + + implicit none + + integer(kind=c_int), allocatable, intent(out) :: val(:) + + allocate(val(1)) + val = 2 + return +end subroutine hello + +program alloc_p + + use, intrinsic :: iso_c_binding, only: c_int + + implicit none + + interface +subroutine hello(val) bind(c) + import :: c_int + implicit none + integer(kind=c_int), allocatable, intent(out) :: val(:) +end subroutine hello + end interface + + integer(kind=c_int), allocatable :: a(:) + + allocate(a(1)) + a = 1 + call hello(a) + stop + +end program alloc_p
[committed] wwwdocs: gcc-5/changes.html: Update link to Intel's pcommit deprecation
--- htdocs/gcc-5/changes.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/htdocs/gcc-5/changes.html b/htdocs/gcc-5/changes.html index 05e796dd..2e2e20e6 100644 --- a/htdocs/gcc-5/changes.html +++ b/htdocs/gcc-5/changes.html @@ -1084,7 +1084,7 @@ are not listed here). IA-32/x86-64 Support for the https://software.intel.com/content/www/us/en/develop/blogs/deprecate-pcommit-instruction.html";>deprecated + href="https://www.intel.com/content/www/us/en/developer/articles/technical/deprecate-pcommit-instruction.html";>deprecated pcommit instruction has been removed. -- 2.33.0
[committed]
Committed as r12-4633-g030875c197e339542ddfcbad90cfc01263151bec To reduce the XFAIL clutter in the *.sum files, this patch removes some pointless XFAIL in favour of pruning the output which should be ignored and using explicit checks for the currently output warnings/errors. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 030875c197e339542ddfcbad90cfc01263151bec Author: Tobias Burnus Date: Sat Oct 23 00:04:43 2021 +0200 Fortran: Change XFAIL to PASS Replace dg-excess-errors by dg-error/warning and dg-prune-output for more fine-grained output handling and to avoid XPASS. gcc/testsuite/ChangeLog: * gfortran.dg/associate_3.f03: Replace dg-excess-errors by other dg-* to change XFAIL to PASS. * gfortran.dg/binding_label_tests_4.f03: Likewise. * gfortran.dg/block_4.f08: Likewise. * gfortran.dg/charlen_04.f90: Likewise. * gfortran.dg/charlen_05.f90: Likewise. * gfortran.dg/charlen_06.f90: Likewise. * gfortran.dg/charlen_13.f90: Likewise. * gfortran.dg/coarray_9.f90: Likewise. * gfortran.dg/coarray_collectives_3.f90: Likewise. * gfortran.dg/data_invalid.f90: Likewise. * gfortran.dg/do_4.f: Likewise. * gfortran.dg/dollar_sym_1.f90: Likewise. * gfortran.dg/dollar_sym_3.f: Likewise. * gfortran.dg/fmt_tab_1.f90: Likewise. * gfortran.dg/fmt_tab_2.f90: Likewise. * gfortran.dg/forall_16.f90: Likewise. * gfortran.dg/g77/970125-0.f: Likewise. * gfortran.dg/gomp/unexpected-end.f90: Likewise. * gfortran.dg/interface_operator_1.f90: Likewise. * gfortran.dg/interface_operator_2.f90: Likewise. * gfortran.dg/line_length_4.f90: Likewise. * gfortran.dg/line_length_5.f90: Likewise. * gfortran.dg/line_length_6.f90: Likewise. * gfortran.dg/line_length_8.f90: Likewise. * gfortran.dg/line_length_9.f90: Likewise. * gfortran.dg/pr65045.f90: Likewise. * gfortran.dg/pr69497.f90: Likewise. * gfortran.dg/submodule_21.f08: Likewise. * gfortran.dg/tab_continuation.f: Likewise. * gfortran.dg/typebound_proc_2.f90: Likewise. * gfortran.dg/warnings_are_errors_1.f90: Likewise. diff --git a/gcc/testsuite/gfortran.dg/associate_3.f03 b/gcc/testsuite/gfortran.dg/associate_3.f03 index da7bec951d1..dfd5a99500e 100644 --- a/gcc/testsuite/gfortran.dg/associate_3.f03 +++ b/gcc/testsuite/gfortran.dg/associate_3.f03 @@ -34,4 +34,4 @@ PROGRAM main INTEGER :: b ! { dg-error "Unexpected data declaration statement" } END ASSOCIATE END PROGRAM main ! { dg-error "Expecting END ASSOCIATE" } -! { dg-excess-errors "Unexpected end of file" } +! { dg-error "Unexpected end of file" "" { target "*-*-*" } 0 } diff --git a/gcc/testsuite/gfortran.dg/binding_label_tests_4.f03 b/gcc/testsuite/gfortran.dg/binding_label_tests_4.f03 index f8c0f046063..af9a588cfec 100644 --- a/gcc/testsuite/gfortran.dg/binding_label_tests_4.f03 +++ b/gcc/testsuite/gfortran.dg/binding_label_tests_4.f03 @@ -20,4 +20,4 @@ module C use A use B ! { dg-error "Cannot open module file" } end module C -! { dg-excess-errors "compilation terminated" } +! { dg-prune-output "compilation terminated" } diff --git a/gcc/testsuite/gfortran.dg/block_4.f08 b/gcc/testsuite/gfortran.dg/block_4.f08 index 4c63194c85d..3ff52b0a098 100644 --- a/gcc/testsuite/gfortran.dg/block_4.f08 +++ b/gcc/testsuite/gfortran.dg/block_4.f08 @@ -15,4 +15,4 @@ PROGRAM main myname2: BLOCK END BLOCK ! { dg-error "Expected block name of 'myname2'" } END PROGRAM main ! { dg-error "Expecting END BLOCK" } -! { dg-excess-errors "Unexpected end of file" } +! { dg-error "Unexpected end of file" "" { target "*-*-*" } 0 } diff --git a/gcc/testsuite/gfortran.dg/charlen_04.f90 b/gcc/testsuite/gfortran.dg/charlen_04.f90 index f93465f2ae6..97aa0ec583c 100644 --- a/gcc/testsuite/gfortran.dg/charlen_04.f90 +++ b/gcc/testsuite/gfortran.dg/charlen_04.f90 @@ -3,6 +3,5 @@ program p type t character(*), allocatable :: x(*) ! { dg-error "must have a deferred shape" } - end type + end type ! { dg-error "needs to be a constant specification" "" { target "*-*-*" } .-1 } end -! { dg-excess-errors "needs to be a constant specification" } diff --git a/gcc/testsuite/gfortran.dg/charlen_05.f90 b/gcc/testsuite/gfortran.dg/charlen_05.f90 index 0eb0015bf38..e58f9263330 100644 --- a/gcc/testsuite/gfortran.dg/charlen_05.f90 +++ b/gcc/testsuite/gfortran.dg/charlen_05.f90 @@ -3,6 +3,5 @@ program p type t character(*) :: x y ! { dg-error "error in data declar
[committed] libstdc++: Constrain std::make_any [PR102894]
std::make_any should be constrained so it can only be called if the construction of the return value would be valid. Tested x86_64-linux, committed to trunk. I plan to backport this too. libstdc++-v3/ChangeLog: PR libstdc++/102894 * include/std/any (make_any): Add SFINAE constraint. * testsuite/20_util/any/102894.cc: New test. --- libstdc++-v3/include/std/any | 13 + libstdc++-v3/testsuite/20_util/any/102894.cc | 20 2 files changed, 29 insertions(+), 4 deletions(-) create mode 100644 libstdc++-v3/testsuite/20_util/any/102894.cc diff --git a/libstdc++-v3/include/std/any b/libstdc++-v3/include/std/any index 9c102a58b26..f75dddf6d92 100644 --- a/libstdc++-v3/include/std/any +++ b/libstdc++-v3/include/std/any @@ -428,16 +428,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION /// Exchange the states of two @c any objects. inline void swap(any& __x, any& __y) noexcept { __x.swap(__y); } - /// Create an any holding a @c _Tp constructed from @c __args. + /// Create an `any` holding a `_Tp` constructed from `__args...`. template -any make_any(_Args&&... __args) +inline +enable_if_t, _Args...>, any> +make_any(_Args&&... __args) { return any(in_place_type<_Tp>, std::forward<_Args>(__args)...); } - /// Create an any holding a @c _Tp constructed from @c __il and @c __args. + /// Create an `any` holding a `_Tp` constructed from `__il` and `__args...`. template -any make_any(initializer_list<_Up> __il, _Args&&... __args) +inline +enable_if_t, + initializer_list<_Up>&, _Args...>, any> +make_any(initializer_list<_Up> __il, _Args&&... __args) { return any(in_place_type<_Tp>, __il, std::forward<_Args>(__args)...); } diff --git a/libstdc++-v3/testsuite/20_util/any/102894.cc b/libstdc++-v3/testsuite/20_util/any/102894.cc new file mode 100644 index 000..66ea9a03fea --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/any/102894.cc @@ -0,0 +1,20 @@ +// { dg-do compile { target c++17 } } +#include + +template +struct can_make_any +: std::false_type +{ }; + +template +struct can_make_any())>> +: std::true_type +{ }; + +struct move_only +{ + move_only() = default; + move_only(move_only&&) = default; +}; + +static_assert( ! can_make_any::value ); // PR libstdc++/102894 -- 2.31.1
[committed] doc: Convert mingw-w64.org links to https
It turns out my link checker does catch broken links under gcc.gnu.org/install/ - fixed thusly. (That makes it all the more puzzling how the issue you fixed last week did not arise, Jonathan.) Gerald gcc: * doc/install.texi (Binaries): Convert mingw-w64.org to https. (Specific): Ditto. --- gcc/doc/install.texi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 7c775965964..38f96bf5a89 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -3473,7 +3473,7 @@ Microsoft Windows: The @uref{https://sourceware.org/cygwin/,,Cygwin} project; @item The @uref{https://osdn.net/projects/mingw/,,MinGW} and -@uref{http://www.mingw-w64.org/,,mingw-w64} projects. +@uref{https://www.mingw-w64.org/,,mingw-w64} projects. @end itemize @item @@ -5080,7 +5080,7 @@ the Win32 subsystem that provides a subset of POSIX. @subheading Intel 64-bit versions GCC contains support for x86-64 using the mingw-w64 -runtime library, available from @uref{http://mingw-w64.org/doku.php}. +runtime library, available from @uref{https://mingw-w64.org/doku.php}. This library should be used with the target triple x86_64-pc-mingw32. Presently Windows for Itanium is not supported. -- 2.33.0
Re: [PATCH][WIP] Add install-dvi Makefile targets
On Fri, Oct 22, 2021 at 8:23 AM Jeff Law wrote: > > > > On 10/18/2021 7:30 PM, Eric Gallager wrote: > > On Tue, Oct 12, 2021 at 5:09 PM Eric Gallager wrote: > >> On Thu, Oct 6, 2016 at 10:41 AM Eric Gallager wrote: > >>> Currently the build machinery handles install-pdf and install-html > >>> targets, but no install-dvi target. This patch is a step towards > >>> fixing that. Note that I have only tested with > >>> --enable-languages=c,c++,lto,objc,obj-c++. Thus, target hooks will > >>> probably also have to be added for the languages I skipped. > >>> Also, please note that this patch applies on top of: > >>> https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00370.html > >>> > >>> ChangeLog: > >>> > >>> 2016-10-06 Eric Gallager > >>> > >>> * Makefile.def: Handle install-dvi target. > >>> * Makefile.tpl: Likewise. > >>> * Makefile.in: Regenerate. > >>> > >>> gcc/ChangeLog: > >>> > >>> 2016-10-06 Eric Gallager > >>> > >>> * Makefile.in: Handle dvidir and install-dvi target. > >>> * ./[c|cp|lto|objc|objcp]/Make-lang.in: Add dummy install-dvi > >>> target hooks. > >>> * configure.ac: Handle install-dvi target. > >>> * configure: Regenerate. > >>> > >>> libiberty/ChangeLog: > >>> > >>> 2016-10-06 Eric Gallager > >>> > >>> * Makefile.in: Handle dvidir and install-dvi target. > >>> * functions.texi: Regenerate. > >> Ping. The prerequisite patch that I linked to previously has gone in now. > >> I'm not sure if this specific patch still applies, though. > >> Also note that I've opened a bug to track this issue: > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102663 > > Hi, I have updated this patch and tested it with more languages now; I > > can now confirm that it works with ada, d, and fortran now. The only > > languages that remain untested now are go (since I'm building on > > darwin and go doesn't build on darwin anyways, as per bug 46986) and > > jit (which I ran into a bug about that I brought up on IRC, and will > > probably need to file on bugzilla). OK to install? > Yea, I think this is OK. We might need to adjust go/jit and perhaps > other toplevel modules, but if those do show up as problems I think we > can fault in those fixes. > > jeff OK thanks, installed as r12-4636: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c3e80a16af287e804b87b8015307085399755cd4
Re: [committed] doc: Convert mingw-w64.org links to https
On Fri, 22 Oct 2021, 23:28 Gerald Pfeifer, wrote: > It turns out my link checker does catch broken links under > gcc.gnu.org/install/ - fixed thusly. > > (That makes it all the more puzzling how the issue you fixed last > week did not arise, Jonathan.) > It didn't give a 404, there was a page at the end of the link, just an empty one. So it probably looks like a good link to your script. > Gerald > > > gcc: > * doc/install.texi (Binaries): Convert mingw-w64.org to https. > (Specific): Ditto. > --- > gcc/doc/install.texi | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi > index 7c775965964..38f96bf5a89 100644 > --- a/gcc/doc/install.texi > +++ b/gcc/doc/install.texi > @@ -3473,7 +3473,7 @@ Microsoft Windows: > The @uref{https://sourceware.org/cygwin/,,Cygwin} project; > @item > The @uref{https://osdn.net/projects/mingw/,,MinGW} and > -@uref{http://www.mingw-w64.org/,,mingw-w64} projects. > +@uref{https://www.mingw-w64.org/,,mingw-w64} projects. > @end itemize > > @item > @@ -5080,7 +5080,7 @@ the Win32 subsystem that provides a subset of POSIX. > > @subheading Intel 64-bit versions > GCC contains support for x86-64 using the mingw-w64 > -runtime library, available from @uref{http://mingw-w64.org/doku.php}. > +runtime library, available from @uref{https://mingw-w64.org/doku.php}. > This library should be used with the target triple x86_64-pc-mingw32. > > Presently Windows for Itanium is not supported. > -- > 2.33.0 >
Re: [committed] doc: Convert mingw-w64.org links to https
On Sat, 23 Oct 2021, 00:43 Jonathan Wakely, wrote: > > > On Fri, 22 Oct 2021, 23:28 Gerald Pfeifer, wrote: > >> It turns out my link checker does catch broken links under >> gcc.gnu.org/install/ - fixed thusly. >> >> (That makes it all the more puzzling how the issue you fixed last >> week did not arise, Jonathan.) >> > > It didn't give a 404, there was a page at the end of the link, just an > empty one. So it probably looks like a good link to your script. > Maybe something is (or was?) still generating old.html, as an empty page: https://gcc.gnu.org/install/old.html
Re: [committed] doc: Convert mingw-w64.org links to https
On Sat, 23 Oct 2021, Jonathan Wakely wrote: >> (That makes it all the more puzzling how the issue you fixed last >> week did not arise, Jonathan.) > It didn't give a 404, there was a page at the end of the link, just > an empty one. So it probably looks like a good link to your script. Yes, as long as there is a page the link is considered valid, regardless of contents. On Sat, 23 Oct 2021, Jonathan Wakely wrote: > Maybe something is (or was?) still generating old.html, as an empty page: > > https://gcc.gnu.org/install/old.html Ahh, that got me thinking. Thank you for the hint, Jonathan! I know what happens and will address it (after a good night's sleep :-). Gerald
[Fortran, committed] Add testcase for PR95196
I've committed another testcase from a bugzilla issue that now appears to be fixed. -Sandra commit 9a0e34eb45e36d4f90cedb61191fd31da0bab256 Author: Sandra Loosemore Date: Fri Oct 22 17:22:00 2021 -0700 Add testcase for PR fortran/95196 2021-10-22 José Rui Faustino de Sousa Sandra Loosemore gcc/testsuite/ PR fortran/95196 * gfortran.dg/PR95196.f90: New. diff --git a/gcc/testsuite/gfortran.dg/PR95196.f90 b/gcc/testsuite/gfortran.dg/PR95196.f90 new file mode 100644 index 000..14333e4 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/PR95196.f90 @@ -0,0 +1,83 @@ +! { dg-do run } + +program rnk_p + + implicit none + + integer, parameter :: n = 10 + integer, parameter :: m = 5 + integer, parameter :: s = 4 + integer, parameter :: l = 4 + integer, parameter :: u = s+l-1 + + integer :: a(n) + integer :: b(n,n) + integer :: c(n,n,n) + integer :: r(s*s*s) + integer :: i + + a = reshape([(i, i=1,n)], [n]) + b = reshape([(i, i=1,n*n)], [n,n]) + c = reshape([(i, i=1,n*n*n)], [n,n,n]) + r(1:s) = a(l:u) + call rnk_s(a(l:u), r(1:s)) + r(1:s*s) = reshape(b(l:u,l:u), [s*s]) + call rnk_s(b(l:u,l:u), r(1:s*s)) + r = reshape(c(l:u,l:u,l:u), [s*s*s]) + call rnk_s(c(l:u,l:7,l:u), r) + stop + +contains + + subroutine rnk_s(a, b) +integer, intent(in) :: a(..) +integer, intent(in) :: b(:) + +!integer :: l(rank(a)), u(rank(a)) does not work due to Bug 94048 +integer, allocatable :: lb(:), ub(:) +integer :: i, j, k, l + +lb = lbound(a) +ub = ubound(a) +select rank(a) +rank(1) + if(any(lb/=lbound(a))) stop 11 + if(any(ub/=ubound(a))) stop 12 + if(size(a)/=size(b)) stop 13 + do i = 1, size(a) +if(a(i)/=b(i)) stop 14 + end do +rank(2) + if(any(lb/=lbound(a))) stop 21 + if(any(ub/=ubound(a))) stop 22 + if(size(a)/=size(b)) stop 23 + k = 0 + do j = 1, size(a, dim=2) +do i = 1, size(a, dim=1) + k = k + 1 + if(a(i,j)/=b(k)) stop 24 +end do + end do +rank(3) + if(any(lb/=lbound(a))) stop 31 + if(any(ub/=ubound(a))) stop 32 + if(size(a)/=size(b)) stop 33 + l = 0 + do k = 1, size(a, dim=3) +do j = 1, size(a, dim=2) + do i = 1, size(a, dim=1) +l = l + 1 +! print *, a(i,j,k), b(l) +if(a(i,j,k)/=b(l)) stop 34 + end do +end do + end do +rank default + stop 171 +end select +deallocate(lb, ub) +return + end subroutine rnk_s + +end program rnk_p +
[r12-4632 Regression] FAIL: gfortran.dg/bind-c-intent-out-2.f90 -Os (test for excess errors) on Linux/x86_64
On Linux/x86_64, 24e99e6ec1cc57f3660c00ff677c7feb16aa94d2 is the first bad commit commit 24e99e6ec1cc57f3660c00ff677c7feb16aa94d2 Author: Tobias Burnus Date: Fri Oct 22 23:23:06 2021 +0200 Fortran: Avoid running into assert with -fcheck= + UBSAN caused FAIL: gfortran.dg/bind-c-intent-out-2.f90 -O0 (test for excess errors) FAIL: gfortran.dg/bind-c-intent-out-2.f90 -O1 (test for excess errors) FAIL: gfortran.dg/bind-c-intent-out-2.f90 -O2 (test for excess errors) FAIL: gfortran.dg/bind-c-intent-out-2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: gfortran.dg/bind-c-intent-out-2.f90 -O3 -g (test for excess errors) FAIL: gfortran.dg/bind-c-intent-out-2.f90 -Os (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4632/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
Cannot reproduce – Re: [r12-4632 Regression] FAIL: gfortran.dg/bind-c-intent-out-2.f90 -Os (test for excess errors) on Linux/x86_64
Hi, for some reasons, I cannot reproduce this. I checked with that I am in sync with master – and I also tried -m32 and -march=cascadelake, running both manually and via DejaGNU but I it passes here. Can someone who sees it show the excess error? Or was that a spurious issue which is now gone? Tobias On 23.10.21 06:43, sunil.k.pandey wrote: On Linux/x86_64, 24e99e6ec1cc57f3660c00ff677c7feb16aa94d2 is the first bad commit commit 24e99e6ec1cc57f3660c00ff677c7feb16aa94d2 Author: Tobias Burnus Date: Fri Oct 22 23:23:06 2021 +0200 Fortran: Avoid running into assert with -fcheck= + UBSAN caused FAIL: gfortran.dg/bind-c-intent-out-2.f90 -O0 (test for excess errors) FAIL: gfortran.dg/bind-c-intent-out-2.f90 -O1 (test for excess errors) FAIL: gfortran.dg/bind-c-intent-out-2.f90 -O2 (test for excess errors) FAIL: gfortran.dg/bind-c-intent-out-2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: gfortran.dg/bind-c-intent-out-2.f90 -O3 -g (test for excess errors) FAIL: gfortran.dg/bind-c-intent-out-2.f90 -Os (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4632/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com) - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955