Re: [PATCH 1/5] libstdc++: Import the fast_float library
* Patrick Palka via Libstdc: > This copies the fast_float library[1] into the compiled-in library > sources. We're going to use this library in our floating-point > std::from_chars implementation for faster and more portable parsing of > binary32/64 decimal strings. > > [1]: https://github.com/fastfloat/fast_float > > Series tested on x86_64, i686, ppc64, ppc64le and aarch64, does it > look OK for trunk? Missing Signed-off-by:? > diff --git a/libstdc++-v3/src/c++17/fast_float/LICENSE-APACHE > b/libstdc++-v3/src/c++17/fast_float/LICENSE-APACHE > new file mode 100644 > index 000..26f4398f249 > --- /dev/null > +++ b/libstdc++-v3/src/c++17/fast_float/LICENSE-APACHE > @@ -0,0 +1,190 @@ > + Apache License > + Version 2.0, January 2004 > +http://www.apache.org/licenses/ > diff --git a/libstdc++-v3/src/c++17/fast_float/LICENSE-MIT > b/libstdc++-v3/src/c++17/fast_float/LICENSE-MIT > new file mode 100644 > index 000..2fb2a37ad7f > --- /dev/null > +++ b/libstdc++-v3/src/c++17/fast_float/LICENSE-MIT > @@ -0,0 +1,27 @@ > +MIT License > + > +Copyright (c) 2021 The fast_float authors You also need to include the README file, which makes it clear that recipients can choose between Apache and MIT. GCC needs to use the MIT option, I think. Thanks, Florian
Re: [PATCH][GCC] arm: add armv9-a architecture to -march
Hi, On Tue, Nov 9, 2021 at 12:36 PM Przemyslaw Wirkus via Gcc-patches < gcc-patches@gcc.gnu.org> wrote: > > > > -Original Message- > > > > From: Przemyslaw Wirkus > > > > Sent: 18 October 2021 10:37 > > > > To: gcc-patches@gcc.gnu.org > > > > Cc: Richard Earnshaw ; Ramana > > > > Radhakrishnan ; Kyrylo Tkachov > > > > ; ni...@redhat.com > > > > Subject: [PATCH][GCC] arm: add armv9-a architecture to -march > > > > > > > > Hi, > > > > > > > > This patch is adding `armv9-a` to -march in Arm GCC. > > > > > > > > In this patch: > > > > + Add `armv9-a` to -march. > > > > + Update multilib with armv9-a and armv9-a+simd. > > > > > > > > After this patch three additional multilib directories are available: > > > > > > > > $ arm-none-eabi-gcc --print-multi-lib .; [...vanilla multi-lib > > > > dirs...] thumb/v9-a/nofp;@mthumb@march=armv9-a@mfloat-abi=soft > > > > thumb/v9-a+simd/softfp;@mthumb@march=armv9-a+simd@mfloat- > > > > abi=softfp > > > > thumb/v9-a+simd/hard;@mthumb@march=armv9-a+simd@mfloat- > > > > abi=hard > > > > > This is causing a GCC build failure when using "old" binutils (I'm using 2.36.1), because the new -march=armv9-a option is not supported. This breaks the multilib support. I don't remember how we handled similar cases in the past? Is that just "expected", and "current" GCC needs "current" binutils, or should we have a multilib list dependent on the actual binutils support? (I think this is not the case, and it sounds like an undesirable extra complication in an already overcrowded mutilib-Makefile) Christophe > > > New multi-lib directories under > > > > $GCC_INSTALL_DIE/lib/gcc/arm-none-eabi/12.0.0/thumb are created: > > > > > > > > thumb/ > > > > +--- v9-a > > > > ||--- nofp > > > > | > > > > +--- v9-a+simd > > > > |--- hard > > > > |--- softfp > > > > > > > > Regtested on arm-none-eabi cross and no issues. > > > > > > > > OK for master? > > Thanks. > > commit 32ba7860ccaddd5219e6dae94a3d0653e124c9dd > > > Ok. > > Thanks, > > Kyrill > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > * config/arm/arm-cpus.in (armv9): New define. > > > > (ARMv9a): New group. > > > > (armv9-a): New arch definition. > > > > * config/arm/arm-tables.opt: Regenerate. > > > > * config/arm/arm.h (BASE_ARCH_9A): New arch enum value. > > > > * config/arm/t-aprofile: Added armv9-a and armv9+simd. > > > > * config/arm/t-arm-elf: Added arm9-a, v9_fps and all_v9_archs > > > > to MULTILIB_MATCHES. > > > > * config/arm/t-multilib: Added v9_a_nosimd_variants and > > > > v9_a_simd_variants to MULTILIB_MATCHES. > > > > * doc/invoke.texi: Update docs. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.target/arm/multilib.exp: Update test with armv9-a entries. > > > > * lib/target-supports.exp (v9a): Add new armflag. > > > > (__ARM_ARCH_9A__): Add new armdef. > > > > > > > > -- > > > > kind regards, > > > > Przemyslaw Wirkus > >
[PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]
Hi, vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c. OK for master? gcc/ChangeLog: PR target/102811 * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c. (extendhfdf2): Split extendhf2 into separate extendhfsf2, extendhfdf2. (truncsfhf2): Likewise. (truncdfhf2): Likewise. gcc/testsuite/ChangeLog: PR target/102811 * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test. --- gcc/config/i386/i386.md | 48 +++ .../i386/avx512vl-vcvtps2ph-pr102811.c| 10 2 files changed, 49 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 6eb9de81921..c5415475342 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4574,15 +4574,30 @@ emit_move_insn (operands[0], CONST0_RTX (V2DFmode)); }) -(define_insn "extendhf2" - [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v") -(float_extend:MODEF +(define_insn "extendhfsf2" + [(set (match_operand:SF 0 "register_operand" "=v") + (float_extend:SF + (match_operand:HF 1 "nonimmediate_operand" "vm")))] + "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL" +{ + if (TARGET_AVX512FP16) +return "vcvtsh2ss\t{%1, %0, %0|%0, %0, %1}"; + else +return "vcvtph2ps\t{%1, %0|%0, %1}"; } + [(set_attr "type" "ssecvt") + (set_attr "prefix" "maybe_evex") + (set_attr "mode" "SF")]) + +(define_insn "extendhfdf2" + [(set (match_operand:DF 0 "nonimm_ssenomem_operand" "=v") + (float_extend:DF (match_operand:HF 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512FP16" - "vcvtsh2\t{%1, %0, %0|%0, %0, %1}" + "vcvtsh2sd\t{%1, %0, %0|%0, %0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") - (set_attr "mode" "")]) + (set_attr "mode" "DF")]) (define_expand "extendxf2" @@ -4766,12 +4781,27 @@ ;; Conversion from {SF,DF}mode to HFmode. -(define_insn "trunchf2" +(define_insn "truncsfhf2" + [(set (match_operand:HF 0 "register_operand" "=v") + (float_truncate:HF + (match_operand:SF 1 "nonimmediate_operand" "vm")))] + "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL" + { +if (TARGET_AVX512FP16) + return "vcvtss2sh\t{%1, %d0|%d0, %1}"; +else + return "vcvtps2ph\t{0, %1, %0|%0, %1, 0}"; + } + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "HF")]) + +(define_insn "truncdfhf2" [(set (match_operand:HF 0 "register_operand" "=v") - (float_truncate:HF - (match_operand:MODEF 1 "nonimmediate_operand" "vm")))] + (float_truncate:HF + (match_operand:DF 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512FP16" - "vcvt2sh\t{%1, %d0|%d0, %1}" + "vcvtsd2sh\t{%1, %d0|%d0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "HF")]) diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c new file mode 100644 index 000..ab44a304a03 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mf16c -mno-avx512fp16" } */ +/* { dg-final { scan-assembler-times "vcvtph2ps\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "vcvtps2ph\[ \\t\]" 1 } } */ +/* { dg-final { scan-assembler-not "__truncsfhf2\[ \\t\]"} } */ +/* { dg-final { scan-assembler-not "__extendhfsf2\[ \\t\]"} } */ +_Float16 test (_Float16 a, _Float16 b) +{ + return a + b; +} -- 2.18.1
Re: Basic kill analysis for modref
> chain_map isn't initialized. > > This caused: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103262 > Hi, this is patch I comitted that moves the misplaced hunk. gcc/ChangeLog: PR ipa/103262 * ipa-modref.c (merge_call_side_effects): Fix uninitialized access. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/modref-dse-5.c: New test. diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c index df4612bbff9..4784f68f585 100644 --- a/gcc/ipa-modref.c +++ b/gcc/ipa-modref.c @@ -964,38 +980,6 @@ merge_call_side_effects (modref_summary *cur_summary, if (flags & (ECF_CONST | ECF_NOVOPS)) return changed; - if (always_executed - && callee_summary->kills.length () - && (!cfun->can_throw_non_call_exceptions - || !stmt_could_throw_p (cfun, stmt))) -{ - /* Watch for self recursive updates. */ - auto_vec saved_kills; - - saved_kills.reserve_exact (callee_summary->kills.length ()); - saved_kills.splice (callee_summary->kills); - for (auto kill : saved_kills) - { - if (kill.parm_index >= (int)parm_map.length ()) - continue; - modref_parm_map &m - = kill.parm_index == MODREF_STATIC_CHAIN_PARM - ? chain_map - : parm_map[kill.parm_index]; - if (m.parm_index == MODREF_LOCAL_MEMORY_PARM - || m.parm_index == MODREF_UNKNOWN_PARM - || m.parm_index == MODREF_RETSLOT_PARM - || !m.parm_offset_known) - continue; - modref_access_node n = kill; - n.parm_index = m.parm_index; - n.parm_offset += m.parm_offset; - if (modref_access_node::insert_kill (cur_summary->kills, n, - record_adjustments)) - changed = true; - } -} - /* We can not safely optimize based on summary of callee if it does not always bind to current def: it is possible that memory load was optimized out earlier which may not happen in the interposed @@ -1043,6 +1027,38 @@ merge_call_side_effects (modref_summary *cur_summary, if (dump_file) fprintf (dump_file, "\n"); + if (always_executed + && callee_summary->kills.length () + && (!cfun->can_throw_non_call_exceptions + || !stmt_could_throw_p (cfun, stmt))) +{ + /* Watch for self recursive updates. */ + auto_vec saved_kills; + + saved_kills.reserve_exact (callee_summary->kills.length ()); + saved_kills.splice (callee_summary->kills); + for (auto kill : saved_kills) + { + if (kill.parm_index >= (int)parm_map.length ()) + continue; + modref_parm_map &m + = kill.parm_index == MODREF_STATIC_CHAIN_PARM + ? chain_map + : parm_map[kill.parm_index]; + if (m.parm_index == MODREF_LOCAL_MEMORY_PARM + || m.parm_index == MODREF_UNKNOWN_PARM + || m.parm_index == MODREF_RETSLOT_PARM + || !m.parm_offset_known) + continue; + modref_access_node n = kill; + n.parm_index = m.parm_index; + n.parm_offset += m.parm_offset; + if (modref_access_node::insert_kill (cur_summary->kills, n, + record_adjustments)) + changed = true; + } +} + /* Merge with callee's summary. */ changed |= cur_summary->loads->merge (callee_summary->loads, &parm_map, &chain_map, record_adjustments); +/* { dg-final { scan-tree-dump "Deleted dead store: kill_me" "dse2" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c new file mode 100644 index 000..ad35b70136f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-dse2-details" } */ +struct a {int a,b,c;}; +__attribute__ ((noinline)) +void +kill_me (struct a *a) +{ + a->a=0; + a->b=0; + a->c=0; +} +__attribute__ ((noinline)) +int +wrap(int b, struct a *a) +{ + kill_me (a); + return b; +} +__attribute__ ((noinline)) +void +my_pleasure (struct a *a) +{ + a->a=1; + a->c=2; +} +__attribute__ ((noinline)) +int +wrap2(int b, struct a *a) +{ + my_pleasure (a); + return b; +} + +int +set (struct a *a) +{ + wrap (0, a); + int ret = wrap2 (0, a); + //int ret = my_pleasure (a); + a->b=1; + return ret; +} +/* { dg-final { scan-tree-dump "Deleted dead store: wrap" "dse2" } } */
[PATCH] i386: add alias for f*mul_*ch intrinsics
Hi, This patch is to add alias for f*mul_*ch intrinsics. Ok for master? gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_mul_pch): Add alias for _mm512_fmul_pch. (_mm512_mask_mul_pch): Likewise. (_mm512_maskz_mul_pch): Likewise. (_mm512_mul_round_pch): Likewise. (_mm512_mask_mul_round_pch): Likewise. (_mm512_maskz_mul_round_pch): Likewise. (_mm512_cmul_pch): Likewise. (_mm512_mask_cmul_pch): Likewise. (_mm512_maskz_cmul_pch): Likewise. (_mm512_cmul_round_pch): Likewise. (_mm512_mask_cmul_round_pch): Likewise. (_mm512_maskz_cmul_round_pch): Likewise. (_mm_mul_sch): Likewise. (_mm_mask_mul_sch): Likewise. (_mm_maskz_mul_sch): Likewise. (_mm_mul_round_sch): Likewise. (_mm_mask_mul_round_sch): Likewise. (_mm_maskz_mul_round_sch): Likewise. (_mm_cmul_sch): Likewise. (_mm_mask_cmul_sch): Likewise. (_mm_maskz_cmul_sch): Likewise. (_mm_cmul_round_sch): Likewise. (_mm_mask_cmul_round_sch): Likewise. (_mm_maskz_cmul_round_sch): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_mul_pch): Likewise. (_mm_mask_mul_pch): Likewise. (_mm_maskz_mul_pch): Likewise. (_mm256_mul_pch): Likewise. (_mm256_mask_mul_pch): Likewise. (_mm256_maskz_mul_pch): Likewise. (_mm_cmul_pch): Likewise. (_mm_mask_cmul_pch): Likewise. (_mm_maskz_cmul_pch): Likewise. (_mm256_cmul_pch): Likewise. (_mm256_mask_cmul_pch): Likewise. (_mm256_maskz_cmul_pch): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vfcmulcph-1a.c: Add new test for alias. * gcc.target/i386/avx512fp16-vfcmulcsh-1a.c: Likewise. * gcc.target/i386/avx512fp16-vfmulcph-1a.c: Likewise. * gcc.target/i386/avx512fp16-vfmulcsh-1a.c: Likewise. * gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c: Likewise. * gcc.target/i386/avx512fp16vl-vfmulcph-1a.c: Likewise. --- gcc/config/i386/avx512fp16intrin.h| 39 +++ gcc/config/i386/avx512fp16vlintrin.h | 17 .../gcc.target/i386/avx512fp16-vfcmulcph-1a.c | 19 ++--- .../gcc.target/i386/avx512fp16-vfcmulcsh-1a.c | 19 ++--- .../gcc.target/i386/avx512fp16-vfmulcph-1a.c | 19 ++--- .../gcc.target/i386/avx512fp16-vfmulcsh-1a.c | 19 ++--- .../i386/avx512fp16vl-vfcmulcph-1a.c | 20 +++--- .../i386/avx512fp16vl-vfmulcph-1a.c | 20 +++--- 8 files changed, 136 insertions(+), 36 deletions(-) diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 44c5e24f234..fe73e693897 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -7162,6 +7162,45 @@ _mm512_set1_pch (_Float16 _Complex __A) return (__m512h) _mm512_set1_ps (u.b); } +// intrinsics below are alias for f*mul_*ch #define _mm512_mul_pch(A, +B) _mm512_fmul_pch ((A), (B)) +#define _mm512_mask_mul_pch(W, U, A, B) \ + _mm512_mask_fmul_pch ((W), (U), (A), (B)) #define +_mm512_maskz_mul_pch(U, A, B) _mm512_maskz_fmul_pch ((U), (A), (B)) +#define _mm512_mul_round_pch(A, B, R) _mm512_fmul_round_pch ((A), (B), (R)) +#define _mm512_mask_mul_round_pch(W, U, A, B, R) \ + _mm512_mask_fmul_round_pch ((W), (U), (A), (B), (R)) +#define _mm512_maskz_mul_round_pch(U, A, B, R) \ + _mm512_maskz_fmul_round_pch ((U), (A), (B), (R)) + +#define _mm512_cmul_pch(A, B) _mm512_fcmul_pch ((A), (B)) +#define _mm512_mask_cmul_pch(W, U, A, B) \ + _mm512_mask_fcmul_pch ((W), (U), (A), (B)) #define +_mm512_maskz_cmul_pch(U, A, B) _mm512_maskz_fcmul_pch ((U), (A), (B)) +#define _mm512_cmul_round_pch(A, B, R) _mm512_fcmul_round_pch ((A), (B), (R)) +#define _mm512_mask_cmul_round_pch(W, U, A, B, R)\ + _mm512_mask_fcmul_round_pch ((W), (U), (A), (B), (R)) +#define _mm512_maskz_cmul_round_pch(U, A, B, R) \ + _mm512_maskz_fcmul_round_pch ((U), (A), (B), (R)) + +#define _mm_mul_sch(A, B) _mm_fmul_sch ((A), (B)) #define +_mm_mask_mul_sch(W, U, A, B) _mm_mask_fmul_sch ((W), (U), (A), (B)) +#define _mm_maskz_mul_sch(U, A, B) _mm_maskz_fmul_sch ((U), (A), (B)) +#define _mm_mul_round_sch(A, B, R) _mm_fmul_round_sch ((A), (B), (R)) +#define _mm_mask_mul_round_sch(W, U, A, B, R)\ + _mm_mask_fmul_round_sch ((W), (U), (A), (B), (R)) +#define _mm_maskz_mul_round_sch(U, A, B, R) \ + _mm_maskz_fmul_round_sch ((U), (A), (B), (R)) + +#define _mm_cmul_sch(A, B) _mm_fcmul_sch ((A), (B)) #define +_mm_mask_cmul_sch(W, U, A, B) _mm_mask_fcmul_sch ((W), (U), (A), (B)) +#define _mm_maskz_cmul_sch(U, A, B) _mm_maskz_fcmul_sch ((U), (A), (B)) +#define _mm_cmul_round_sch(A, B, R) _mm_fcmul_round_sch ((A), (B), (R)) +#define
Re: [PATCH] i386: add alias for f*mul_*ch intrinsics
On Tue, Nov 16, 2021 at 4:23 PM Kong, Lingling via Gcc-patches wrote: > > Hi, > > This patch is to add alias for f*mul_*ch intrinsics. > > Ok for master? This patch just adds some macro definitions (new aliases for intrinsic) to the header file, and I think this should be low risk. And considering that the intel intrinsic guide has been updated with those aliases, it would be inconvenienced if they were not in the latest gcc, so I think we should install this. Ok if there's no other objections. > > gcc/ChangeLog: > > * config/i386/avx512fp16intrin.h (_mm512_mul_pch): Add alias for > _mm512_fmul_pch. > (_mm512_mask_mul_pch): Likewise. > (_mm512_maskz_mul_pch): Likewise. > (_mm512_mul_round_pch): Likewise. > (_mm512_mask_mul_round_pch): Likewise. > (_mm512_maskz_mul_round_pch): Likewise. > (_mm512_cmul_pch): Likewise. > (_mm512_mask_cmul_pch): Likewise. > (_mm512_maskz_cmul_pch): Likewise. > (_mm512_cmul_round_pch): Likewise. > (_mm512_mask_cmul_round_pch): Likewise. > (_mm512_maskz_cmul_round_pch): Likewise. > (_mm_mul_sch): Likewise. > (_mm_mask_mul_sch): Likewise. > (_mm_maskz_mul_sch): Likewise. > (_mm_mul_round_sch): Likewise. > (_mm_mask_mul_round_sch): Likewise. > (_mm_maskz_mul_round_sch): Likewise. > (_mm_cmul_sch): Likewise. > (_mm_mask_cmul_sch): Likewise. > (_mm_maskz_cmul_sch): Likewise. > (_mm_cmul_round_sch): Likewise. > (_mm_mask_cmul_round_sch): Likewise. > (_mm_maskz_cmul_round_sch): Likewise. > * config/i386/avx512fp16vlintrin.h (_mm_mul_pch): Likewise. > (_mm_mask_mul_pch): Likewise. > (_mm_maskz_mul_pch): Likewise. > (_mm256_mul_pch): Likewise. > (_mm256_mask_mul_pch): Likewise. > (_mm256_maskz_mul_pch): Likewise. > (_mm_cmul_pch): Likewise. > (_mm_mask_cmul_pch): Likewise. > (_mm_maskz_cmul_pch): Likewise. > (_mm256_cmul_pch): Likewise. > (_mm256_mask_cmul_pch): Likewise. > (_mm256_maskz_cmul_pch): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx512fp16-vfcmulcph-1a.c: Add new test for alias. > * gcc.target/i386/avx512fp16-vfcmulcsh-1a.c: Likewise. > * gcc.target/i386/avx512fp16-vfmulcph-1a.c: Likewise. > * gcc.target/i386/avx512fp16-vfmulcsh-1a.c: Likewise. > * gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c: Likewise. > * gcc.target/i386/avx512fp16vl-vfmulcph-1a.c: Likewise. > --- > gcc/config/i386/avx512fp16intrin.h| 39 +++ > gcc/config/i386/avx512fp16vlintrin.h | 17 > .../gcc.target/i386/avx512fp16-vfcmulcph-1a.c | 19 ++--- > .../gcc.target/i386/avx512fp16-vfcmulcsh-1a.c | 19 ++--- > .../gcc.target/i386/avx512fp16-vfmulcph-1a.c | 19 ++--- > .../gcc.target/i386/avx512fp16-vfmulcsh-1a.c | 19 ++--- > .../i386/avx512fp16vl-vfcmulcph-1a.c | 20 +++--- > .../i386/avx512fp16vl-vfmulcph-1a.c | 20 +++--- > 8 files changed, 136 insertions(+), 36 deletions(-) > > diff --git a/gcc/config/i386/avx512fp16intrin.h > b/gcc/config/i386/avx512fp16intrin.h > index 44c5e24f234..fe73e693897 100644 > --- a/gcc/config/i386/avx512fp16intrin.h > +++ b/gcc/config/i386/avx512fp16intrin.h > @@ -7162,6 +7162,45 @@ _mm512_set1_pch (_Float16 _Complex __A) >return (__m512h) _mm512_set1_ps (u.b); } > > +// intrinsics below are alias for f*mul_*ch #define _mm512_mul_pch(A, > +B) _mm512_fmul_pch ((A), (B)) > +#define _mm512_mask_mul_pch(W, U, A, B) > \ > + _mm512_mask_fmul_pch ((W), (U), (A), (B)) #define > +_mm512_maskz_mul_pch(U, A, B) _mm512_maskz_fmul_pch ((U), (A), (B)) > +#define _mm512_mul_round_pch(A, B, R) _mm512_fmul_round_pch ((A), (B), (R)) > +#define _mm512_mask_mul_round_pch(W, U, A, B, R) \ > + _mm512_mask_fmul_round_pch ((W), (U), (A), (B), (R)) > +#define _mm512_maskz_mul_round_pch(U, A, B, R) \ > + _mm512_maskz_fmul_round_pch ((U), (A), (B), (R)) > + > +#define _mm512_cmul_pch(A, B) _mm512_fcmul_pch ((A), (B)) > +#define _mm512_mask_cmul_pch(W, U, A, B) \ > + _mm512_mask_fcmul_pch ((W), (U), (A), (B)) #define > +_mm512_maskz_cmul_pch(U, A, B) _mm512_maskz_fcmul_pch ((U), (A), (B)) > +#define _mm512_cmul_round_pch(A, B, R) _mm512_fcmul_round_pch ((A), (B), (R)) > +#define _mm512_mask_cmul_round_pch(W, U, A, B, R)\ > + _mm512_mask_fcmul_round_pch ((W), (U), (A), (B), (R)) > +#define _mm512_maskz_cmul_round_pch(U, A, B, R) > \ > + _mm512_maskz_fcmul_round_pch ((U), (A), (B), (R)) > + > +#define _mm_mul_sch(A, B) _mm_fmul_sch ((A), (B)) #define > +_mm_mask_mul_sch(W, U, A, B) _mm_mask_fmul_sch ((W), (U), (A), (B)) > +#define _mm_maskz_mul_sch(U, A, B) _mm_maskz_fmul_sch ((U), (A), (B)) > +#def
Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]
On Tue, Nov 16, 2021 at 4:15 PM Kong, Lingling via Gcc-patches wrote: > > Hi, > > vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with > -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c. > > OK for master? > > gcc/ChangeLog: > > PR target/102811 > * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c. > (extendhfdf2): Split extendhf2 into separate extendhfsf2, > extendhfdf2. > (truncsfhf2): Likewise. > (truncdfhf2): Likewise. > > gcc/testsuite/ChangeLog: > > PR target/102811 > * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test. > --- > gcc/config/i386/i386.md | 48 +++ > .../i386/avx512vl-vcvtps2ph-pr102811.c| 10 > 2 files changed, 49 insertions(+), 9 deletions(-) create mode 100644 > gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index > 6eb9de81921..c5415475342 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -4574,15 +4574,30 @@ >emit_move_insn (operands[0], CONST0_RTX (V2DFmode)); > }) > > -(define_insn "extendhf2" > - [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v") > -(float_extend:MODEF > +(define_insn "extendhfsf2" > + [(set (match_operand:SF 0 "register_operand" "=v") > + (float_extend:SF > + (match_operand:HF 1 "nonimmediate_operand" "vm")))] > + "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL" > +{ > + if (TARGET_AVX512FP16) > +return "vcvtsh2ss\t{%1, %0, %0|%0, %0, %1}"; > + else > +return "vcvtph2ps\t{%1, %0|%0, %1}"; } > + [(set_attr "type" "ssecvt") > + (set_attr "prefix" "maybe_evex") > + (set_attr "mode" "SF")]) > + > +(define_insn "extendhfdf2" > + [(set (match_operand:DF 0 "nonimm_ssenomem_operand" "=v") > + (float_extend:DF > (match_operand:HF 1 "nonimmediate_operand" "vm")))] >"TARGET_AVX512FP16" > - "vcvtsh2\t{%1, %0, %0|%0, %0, %1}" > + "vcvtsh2sd\t{%1, %0, %0|%0, %0, %1}" >[(set_attr "type" "ssecvt") > (set_attr "prefix" "evex") > - (set_attr "mode" "")]) > + (set_attr "mode" "DF")]) > > > (define_expand "extendxf2" > @@ -4766,12 +4781,27 @@ > > ;; Conversion from {SF,DF}mode to HFmode. > > -(define_insn "trunchf2" > +(define_insn "truncsfhf2" > + [(set (match_operand:HF 0 "register_operand" "=v") > + (float_truncate:HF > + (match_operand:SF 1 "nonimmediate_operand" "vm")))] > + "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL" > + { > +if (TARGET_AVX512FP16) > + return "vcvtss2sh\t{%1, %d0|%d0, %1}"; > +else > + return "vcvtps2ph\t{0, %1, %0|%0, %1, 0}"; > + } > + [(set_attr "type" "ssecvt") > + (set_attr "prefix" "evex") > + (set_attr "mode" "HF")]) > + > +(define_insn "truncdfhf2" >[(set (match_operand:HF 0 "register_operand" "=v") > - (float_truncate:HF > - (match_operand:MODEF 1 "nonimmediate_operand" "vm")))] > + (float_truncate:HF > + (match_operand:DF 1 "nonimmediate_operand" "vm")))] >"TARGET_AVX512FP16" > - "vcvt2sh\t{%1, %d0|%d0, %1}" > + "vcvtsd2sh\t{%1, %d0|%d0, %1}" >[(set_attr "type" "ssecvt") > (set_attr "prefix" "evex") > (set_attr "mode" "HF")]) > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c > b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c > new file mode 100644 > index 000..ab44a304a03 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mf16c -mno-avx512fp16" } */ > +/* { dg-final { scan-assembler-times "vcvtph2ps\[ \\t\]" 2 } } */ > +/* { dg-final { scan-assembler-times "vcvtps2ph\[ \\t\]" 1 } } */ > +/* { dg-final { scan-assembler-not "__truncsfhf2\[ \\t\]"} } */ > +/* { dg-final { scan-assembler-not "__extendhfsf2\[ \\t\]"} } */ > +_Float16 test (_Float16 a, _Float16 b) > +{ > + return a + b; > +} > -- > 2.18.1 > -- BR, Hongtao
Re: [PATCH] x86_64: Avoid rorx rotation instructions with -Os
On Mon, Nov 15, 2021 at 2:54 PM Roger Sayle wrote: > > > This patch teaches the i386 backend to avoid using BMI2's rorx > instructions when optimizing for size. The benefits are shown > with the following example: > > unsigned int ror1(unsigned int x) { return (x >> 1) | (x << 31); } > unsigned int ror2(unsigned int x) { return (x >> 2) | (x << 30); } > unsigned int rol2(unsigned int x) { return (x >> 30) | (x << 2); } > unsigned int rol1(unsigned int x) { return (x >> 31) | (x << 1); } > > which currently with -Os -march=cascadelake generates: > > ror1: rorx$1, %edi, %eax // 6 bytes > ret > ror2: rorx$2, %edi, %eax // 6 bytes > ret > rol2: rorx$30, %edi, %eax // 6 bytes > ret > rol1: rorx$31, %edi, %eax // 6 bytes > ret > > but with this patch now generates: > > ror1: movl%edi, %eax // 2 bytes > rorl%eax// 2 bytes > ret > ror2: movl%edi, %eax // 2 bytes > rorl$2, %eax// 3 bytes > ret > rol2: movl%edi, %eax // 2 bytes > roll$2, %eax// 3 bytes > ret > rol1: movl%edi, %eax // 2 bytes > roll%eax// 2 bytes > ret > > I've confirmed that this patch is a win on the CSiBE benchmark, > even though rotations are rare, where for example libmspack/test/md5.o > shrinks from 5824 bytes to 5632 bytes. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check with no new failures. Ok for mainline? > > > 2021-11-15 Roger Sayle > > gcc/ChangeLog > * config/i386/i386.md (*bmi2_rorx_1): Make conditional > on !optimize_function_for_size_p. > (*3_1): Add preferred_for_size attribute. > (define_splits): Conditionalize on !optimize_function_for_size_p. > (*bmi2_rorxsi3_1_zext): Likewise. > (*si2_1_zext): Add preferred_for_size attribute. > (define_splits): Conditionalize on !optimize_function_for_size_p. OK. Thanks, Uros.
Re: [PATCH] gcc: implement AIX-style constructors
> Hi David, > > Here is the new version of the patch. > I've moved the startup function in crtcdtors files. > > I'm just wondering if the part dealing with the > __init_aix_libgcc_cxa_atexit is needed. I'm adding it because > the destructor created in crtcxa.o is following GCC format and > thus won't be launched if the flag "-mcdtors=aix" is passed. > However, as you said, this option might not operate correctly > if the GCC runtime isn't rebuild with it. Gentle Ping. Thanks, Clément
Re: aix: Add FAT library support for libffi for AIX
> Even if GCC64 is able to boostrap without libffi being a > FAT library on AIX, the tests for "-maix32" are not working > without it. > > libffi/ChangeLog: > 2021-10-21 Clément Chigot > > * Makefile.am (tmake_file): Build and install AIX-style FAT > libraries. > * Makefile.in: Regenerate. > * include/Makefile.in: Regenerate. > * man/Makefile.in: Regenerate. > * testsuite/Makefile.in: Regenerate. > * configure (tmake_file): Substitute. > * configure.ac: Regenerate. > * configure.host (powerpc-*-aix*): Define tmake_file. > * src/powerpc/t-aix: New file. > > I've already made a PR to libffi itself in order to add the common part of > this patch to it. But for now, it's still unmerged: > https://github.com/libffi/libffi/pull/661. Gentle ping, Thanks Clément
Re: [PATCH] aix: handle 64bit inodes for include directories
Hi everyone, Gentle ping Thanks, Clément From: CHIGOT, CLEMENT Sent: Tuesday, October 26, 2021 4:51 PM To: Jeff Law ; David Malcolm Cc: gcc-patches@gcc.gnu.org ; David Edelsohn Subject: Re: [PATCH] aix: handle 64bit inodes for include directories Hi everyone, Gentle ping on this patch. Clément From: CHIGOT, CLEMENT Sent: Tuesday, October 12, 2021 10:35 AM To: Jeff Law ; David Malcolm Cc: gcc-patches@gcc.gnu.org ; David Edelsohn Subject: Re: [PATCH] aix: handle 64bit inodes for include directories Hi Jeff, Any update on this patch ? As it's dealing with configure files, I would like to have it merged asap before any conflicts appear. Thanks, Clément
Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]
On Tue, Nov 16, 2021 at 9:15 AM Kong, Lingling via Gcc-patches wrote: > > Hi, > > vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with > -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c. > > OK for master? No, this is the wrong approach. There can be invalid values in the high elements of the vector, so these should be cleared before conversion. Please see the attached (unfinished) patch and use it as a starting point. Please note that we can now allow 2-byte values in SSE registers, so movhi_internal and ix86_can_change_mode_class should be updated accordingly. Uros. > > gcc/ChangeLog: > > PR target/102811 > * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c. > (extendhfdf2): Split extendhf2 into separate extendhfsf2, > extendhfdf2. > (truncsfhf2): Likewise. > (truncdfhf2): Likewise. > > gcc/testsuite/ChangeLog: > > PR target/102811 > * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test. > --- > gcc/config/i386/i386.md | 48 +++ > .../i386/avx512vl-vcvtps2ph-pr102811.c| 10 > 2 files changed, 49 insertions(+), 9 deletions(-) create mode 100644 > gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index > 6eb9de81921..c5415475342 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -4574,15 +4574,30 @@ >emit_move_insn (operands[0], CONST0_RTX (V2DFmode)); > }) > > -(define_insn "extendhf2" > - [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v") > -(float_extend:MODEF > +(define_insn "extendhfsf2" > + [(set (match_operand:SF 0 "register_operand" "=v") > + (float_extend:SF > + (match_operand:HF 1 "nonimmediate_operand" "vm")))] > + "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL" > +{ > + if (TARGET_AVX512FP16) > +return "vcvtsh2ss\t{%1, %0, %0|%0, %0, %1}"; > + else > +return "vcvtph2ps\t{%1, %0|%0, %1}"; } > + [(set_attr "type" "ssecvt") > + (set_attr "prefix" "maybe_evex") > + (set_attr "mode" "SF")]) > + > +(define_insn "extendhfdf2" > + [(set (match_operand:DF 0 "nonimm_ssenomem_operand" "=v") > + (float_extend:DF > (match_operand:HF 1 "nonimmediate_operand" "vm")))] >"TARGET_AVX512FP16" > - "vcvtsh2\t{%1, %0, %0|%0, %0, %1}" > + "vcvtsh2sd\t{%1, %0, %0|%0, %0, %1}" >[(set_attr "type" "ssecvt") > (set_attr "prefix" "evex") > - (set_attr "mode" "")]) > + (set_attr "mode" "DF")]) > > > (define_expand "extendxf2" > @@ -4766,12 +4781,27 @@ > > ;; Conversion from {SF,DF}mode to HFmode. > > -(define_insn "trunchf2" > +(define_insn "truncsfhf2" > + [(set (match_operand:HF 0 "register_operand" "=v") > + (float_truncate:HF > + (match_operand:SF 1 "nonimmediate_operand" "vm")))] > + "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL" > + { > +if (TARGET_AVX512FP16) > + return "vcvtss2sh\t{%1, %d0|%d0, %1}"; > +else > + return "vcvtps2ph\t{0, %1, %0|%0, %1, 0}"; > + } > + [(set_attr "type" "ssecvt") > + (set_attr "prefix" "evex") > + (set_attr "mode" "HF")]) > + > +(define_insn "truncdfhf2" >[(set (match_operand:HF 0 "register_operand" "=v") > - (float_truncate:HF > - (match_operand:MODEF 1 "nonimmediate_operand" "vm")))] > + (float_truncate:HF > + (match_operand:DF 1 "nonimmediate_operand" "vm")))] >"TARGET_AVX512FP16" > - "vcvt2sh\t{%1, %d0|%d0, %1}" > + "vcvtsd2sh\t{%1, %d0|%d0, %1}" >[(set_attr "type" "ssecvt") > (set_attr "prefix" "evex") > (set_attr "mode" "HF")]) > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c > b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c > new file mode 100644 > index 000..ab44a304a03 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mf16c -mno-avx512fp16" } */ > +/* { dg-final { scan-assembler-times "vcvtph2ps\[ \\t\]" 2 } } */ > +/* { dg-final { scan-assembler-times "vcvtps2ph\[ \\t\]" 1 } } */ > +/* { dg-final { scan-assembler-not "__truncsfhf2\[ \\t\]"} } */ > +/* { dg-final { scan-assembler-not "__extendhfsf2\[ \\t\]"} } */ > +_Float16 test (_Float16 a, _Float16 b) > +{ > + return a + b; > +} > -- > 2.18.1 > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 9cc903e826b..21a3a45d22c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -19462,9 +19462,8 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to, disallow a change to these modes, reload will assume it's ok to drop the subreg from (subreg:SI (reg:HI 100) 0). This affects the vec_dupv4hi pattern. -NB: AVX512FP16 supports vmovw which can load 16bit data to sse -register. */ - int mov_size = MAYBE_SSE_CLASS_P (regclass) &&
Re: [PATCH] pch: Add support for PCH for relocatable executables
On Sat, Nov 13, 2021 at 08:32:41PM +, Iain Sandoe wrote: > IMO both this series > - which restores the ability to work with PIE exes but requires a known > address for the PCH > and the series I posted > - which allows a configuration to opt out of PCH anyway > > could be useful - for Darwin I prefer this series. Yeah, I think we want both and let the users choose. Finding a hole can be indeed hard on 32-bit VA, but no OS I've seen randomizes across the whole 44 or 48 or how many bits VA, otherwise e.g. address sanitizer or thread sanitizer would have no chance to work either. Having the PCH blob be relocatable would be achievable too, we have all the information in the GTY for it after all when we are able to relocate it at PCH saving time, but don't do that currently because it would be more expensive at PCH restore time. But perhaps better to do that as a fallback if we don't manage to get the right slot. Jakub
[PATCH] waccess: Fix up pass_waccess::check_alloc_size_call [PR102009]
Hi! This function punts if the builtins have no arguments, but as can be seen on the testcase, even if it has some arguments but alloc_size attribute's arguments point to arguments that aren't passed, we get a warning earlier from the FE but should punt rather than ICE on it. Other users of alloc_size attribute e.g. in tree-object-size.c (alloc_object_size) punt similarly and similarly even in the same TU maybe_warn_nonstring_arg correctly verifies calls have enough arguments. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-11-16 Jakub Jelinek PR tree-optimization/102009 * gimple-ssa-warn-access.cc (pass_waccess::check_alloc_size_call): Punt if any of alloc_size arguments is out of bounds vs. number of call arguments. * gcc.dg/pr102009.c: New test. --- gcc/gimple-ssa-warn-access.cc.jj2021-11-09 15:25:15.0 +0100 +++ gcc/gimple-ssa-warn-access.cc 2021-11-15 17:22:44.769580185 +0100 @@ -2335,10 +2335,6 @@ pass_waccess::check_alloca (gcall *stmt) void pass_waccess::check_alloc_size_call (gcall *stmt) { - if (gimple_call_num_args (stmt) < 1) -/* Avoid invalid calls to functions without a prototype. */ -return; - tree fndecl = gimple_call_fndecl (stmt); if (fndecl && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)) { @@ -2367,13 +2363,19 @@ pass_waccess::check_alloc_size_call (gca the actual argument(s) at those indices in ALLOC_ARGS. */ int idx[2] = { -1, -1 }; tree alloc_args[] = { NULL_TREE, NULL_TREE }; + unsigned nargs = gimple_call_num_args (stmt); tree args = TREE_VALUE (alloc_size); idx[0] = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1; + /* Avoid invalid calls to functions without a prototype. */ + if ((unsigned) idx[0] >= nargs) +return; alloc_args[0] = call_arg (stmt, idx[0]); if (TREE_CHAIN (args)) { idx[1] = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN (args))) - 1; + if ((unsigned) idx[1] >= nargs) + return; alloc_args[1] = call_arg (stmt, idx[1]); } --- gcc/testsuite/gcc.dg/pr102009.c.jj 2021-11-15 17:29:19.090162531 +0100 +++ gcc/testsuite/gcc.dg/pr102009.c 2021-11-15 17:30:08.328486037 +0100 @@ -0,0 +1,10 @@ +/* PR tree-optimization/102009 */ +/* { dg-do compile } */ + +void *realloc (); /* { dg-message "declared here" } */ + +void * +foo (void *p) +{ + return realloc (p); /* { dg-warning "too few arguments to built-in function 'realloc' expecting " } */ +} Jakub
Re: [PATCH] waccess: Fix up pass_waccess::check_alloc_size_call [PR102009]
On Tue, 16 Nov 2021, Jakub Jelinek wrote: > Hi! > > This function punts if the builtins have no arguments, but as can be seen > on the testcase, even if it has some arguments but alloc_size attribute's > arguments point to arguments that aren't passed, we get a warning earlier > from the FE but should punt rather than ICE on it. > Other users of alloc_size attribute e.g. in > tree-object-size.c (alloc_object_size) punt similarly and similarly > even in the same TU maybe_warn_nonstring_arg correctly verifies calls have > enough arguments. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Thanks, Richard. > 2021-11-16 Jakub Jelinek > > PR tree-optimization/102009 > * gimple-ssa-warn-access.cc (pass_waccess::check_alloc_size_call): > Punt if any of alloc_size arguments is out of bounds vs. number of > call arguments. > > * gcc.dg/pr102009.c: New test. > > --- gcc/gimple-ssa-warn-access.cc.jj 2021-11-09 15:25:15.0 +0100 > +++ gcc/gimple-ssa-warn-access.cc 2021-11-15 17:22:44.769580185 +0100 > @@ -2335,10 +2335,6 @@ pass_waccess::check_alloca (gcall *stmt) > void > pass_waccess::check_alloc_size_call (gcall *stmt) > { > - if (gimple_call_num_args (stmt) < 1) > -/* Avoid invalid calls to functions without a prototype. */ > -return; > - >tree fndecl = gimple_call_fndecl (stmt); >if (fndecl && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)) > { > @@ -2367,13 +2363,19 @@ pass_waccess::check_alloc_size_call (gca > the actual argument(s) at those indices in ALLOC_ARGS. */ >int idx[2] = { -1, -1 }; >tree alloc_args[] = { NULL_TREE, NULL_TREE }; > + unsigned nargs = gimple_call_num_args (stmt); > >tree args = TREE_VALUE (alloc_size); >idx[0] = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1; > + /* Avoid invalid calls to functions without a prototype. */ > + if ((unsigned) idx[0] >= nargs) > +return; >alloc_args[0] = call_arg (stmt, idx[0]); >if (TREE_CHAIN (args)) > { >idx[1] = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN (args))) - 1; > + if ((unsigned) idx[1] >= nargs) > + return; >alloc_args[1] = call_arg (stmt, idx[1]); > } > > --- gcc/testsuite/gcc.dg/pr102009.c.jj2021-11-15 17:29:19.090162531 > +0100 > +++ gcc/testsuite/gcc.dg/pr102009.c 2021-11-15 17:30:08.328486037 +0100 > @@ -0,0 +1,10 @@ > +/* PR tree-optimization/102009 */ > +/* { dg-do compile } */ > + > +void *realloc ();/* { dg-message "declared here" } */ > + > +void * > +foo (void *p) > +{ > + return realloc (p);/* { dg-warning "too few arguments to built-in > function 'realloc' expecting " } */ > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
[committed] openmp: Regimplify operands of GIMPLE_COND in a few more places [PR103208]
Hi! As the testcase shows, the non-rectangular loop expansion code didn't try to regimplify operands of GIMPLE_CONDs it built in some cases. I have added a helper function which does that and used it in some places that were regimplifying already to simplify those spots, plus added it in a couple of other places where it was needed. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2021-11-16 Jakub Jelinek PR tree-optimization/103208 * omp-expand.c (expand_omp_build_cond): New function. (expand_omp_for_init_counts, expand_omp_for_init_vars, expand_omp_for_static_nochunk, expand_omp_for_static_chunk): Use it. * c-c++-common/gomp/loop-11.c: New test. --- gcc/omp-expand.c.jj 2021-11-11 14:35:37.631348121 +0100 +++ gcc/omp-expand.c2021-11-15 20:39:22.666976655 +0100 @@ -1208,6 +1208,28 @@ expand_omp_build_assign (gimple_stmt_ite } } +/* Prepend or append LHS CODE RHS condition before or after *GSI_P. */ + +static gcond * +expand_omp_build_cond (gimple_stmt_iterator *gsi_p, enum tree_code code, + tree lhs, tree rhs, bool after = false) +{ + gcond *cond_stmt = gimple_build_cond (code, lhs, rhs, NULL_TREE, NULL_TREE); + if (after) +gsi_insert_after (gsi_p, cond_stmt, GSI_CONTINUE_LINKING); + else +gsi_insert_before (gsi_p, cond_stmt, GSI_SAME_STMT); + if (walk_tree (gimple_cond_lhs_ptr (cond_stmt), expand_omp_regimplify_p, +NULL, NULL) + || walk_tree (gimple_cond_rhs_ptr (cond_stmt), expand_omp_regimplify_p, + NULL, NULL)) +{ + gimple_stmt_iterator gsi = gsi_for_stmt (cond_stmt); + gimple_regimplify_operands (cond_stmt, &gsi); +} + return cond_stmt; +} + /* Expand the OpenMP parallel or task directive starting at REGION. */ static void @@ -1868,17 +1890,8 @@ expand_omp_for_init_counts (struct omp_f n2 = fold_convert (itype, unshare_expr (fd->loops[i].n2)); n2 = force_gimple_operand_gsi (gsi, n2, true, NULL_TREE, true, GSI_SAME_STMT); - cond_stmt = gimple_build_cond (fd->loops[i].cond_code, n1, n2, -NULL_TREE, NULL_TREE); - gsi_insert_before (gsi, cond_stmt, GSI_SAME_STMT); - if (walk_tree (gimple_cond_lhs_ptr (cond_stmt), -expand_omp_regimplify_p, NULL, NULL) - || walk_tree (gimple_cond_rhs_ptr (cond_stmt), - expand_omp_regimplify_p, NULL, NULL)) - { - *gsi = gsi_for_stmt (cond_stmt); - gimple_regimplify_operands (cond_stmt, gsi); - } + cond_stmt = expand_omp_build_cond (gsi, fd->loops[i].cond_code, +n1, n2); e = split_block (entry_bb, cond_stmt); basic_block &zero_iter_bb = i < fd->collapse ? zero_iter1_bb : zero_iter2_bb; @@ -2075,18 +2088,16 @@ expand_omp_for_init_counts (struct omp_f n2e = force_gimple_operand_gsi (&gsi2, n2e, true, NULL_TREE, true, GSI_SAME_STMT); gcond *cond_stmt - = gimple_build_cond (fd->loops[i].cond_code, n1, n2, -NULL_TREE, NULL_TREE); - gsi_insert_before (&gsi2, cond_stmt, GSI_SAME_STMT); + = expand_omp_build_cond (&gsi2, fd->loops[i].cond_code, +n1, n2); e = split_block (bb1, cond_stmt); e->flags = EDGE_TRUE_VALUE; e->probability = profile_probability::likely ().guessed (); basic_block bb2 = e->dest; gsi2 = gsi_after_labels (bb2); - cond_stmt = gimple_build_cond (fd->loops[i].cond_code, n1e, n2e, -NULL_TREE, NULL_TREE); - gsi_insert_before (&gsi2, cond_stmt, GSI_SAME_STMT); + cond_stmt = expand_omp_build_cond (&gsi2, fd->loops[i].cond_code, +n1e, n2e); e = split_block (bb2, cond_stmt); e->flags = EDGE_TRUE_VALUE; e->probability = profile_probability::likely ().guessed (); @@ -2137,9 +2148,8 @@ expand_omp_for_init_counts (struct omp_f e->probability = profile_probability::unlikely ().guessed (); gsi2 = gsi_after_labels (bb3); - cond_stmt = gimple_build_cond (fd->loops[i].cond_code, n1e, n2e, -NULL_TREE, NULL_TREE); - gsi_insert_before (&gsi2, cond_stmt, GSI_SAME_STMT); + cond_stmt = expand_omp_build_cond (&gsi2, fd->loops[i].cond_code, +n1e, n2e); e = split_block (bb3, cond_stmt); e->flags = EDGE_TRUE_VALUE; e->probability = profile_probability::likely ().guessed (); @@ -2193,9 +2203,8 @@ expand_omp_for_init_counts (struct omp_f true, GSI_SAME_STMT);
[committed] libgomp: Mark thread_limit clause to target construct as implemented
On Mon, Nov 15, 2021 at 02:00:42PM +0100, Tobias Burnus wrote: > Fortran: openmp: Add support for thread_limit clause on target > > gcc/fortran/ChangeLog: > > * openmp.c (OMP_TARGET_CLAUSES): Add thread_limit. > * trans-openmp.c (gfc_split_omp_clauses): Add thread_limit also to > teams. After the Fortran changes we can mark it as implemented... Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2021-11-16 Jakub Jelinek * libgomp.texi (OpenMP 5.1): Mark thread_limit clause to target construct as implemented. --- libgomp/libgomp.texi.jj 2021-10-27 09:24:43.312822017 +0200 +++ libgomp/libgomp.texi2021-11-15 22:29:35.210487522 +0100 @@ -292,7 +292,7 @@ The OpenMP 4.5 specification is fully su clauses of the taskloop construct @tab Y @tab @item @code{align} clause/modifier in @code{allocate} directive/clause and @code{allocator} directive @tab P @tab C/C++ on clause only -@item @code{thread_limit} clause to @code{target} construct @tab N @tab +@item @code{thread_limit} clause to @code{target} construct @tab Y @tab @item @code{has_device_addr} clause to @code{target} construct @tab N @tab @item iterators in @code{target update} motion clauses and @code{map} clauses @tab N @tab Jakub
Re: [PATCH 1/5] libstdc++: Import the fast_float library
On Tue, 16 Nov 2021 at 08:01, Florian Weimer wrote: > > * Patrick Palka via Libstdc: > > > This copies the fast_float library[1] into the compiled-in library > > sources. We're going to use this library in our floating-point > > std::from_chars implementation for faster and more portable parsing of > > binary32/64 decimal strings. > > > > [1]: https://github.com/fastfloat/fast_float > > > > Series tested on x86_64, i686, ppc64, ppc64le and aarch64, does it > > look OK for trunk? > > Missing Signed-off-by:? That's not needed if Patrick is still covered by an FSF assignment. > > > diff --git a/libstdc++-v3/src/c++17/fast_float/LICENSE-APACHE > > b/libstdc++-v3/src/c++17/fast_float/LICENSE-APACHE > > new file mode 100644 > > index 000..26f4398f249 > > --- /dev/null > > +++ b/libstdc++-v3/src/c++17/fast_float/LICENSE-APACHE > > @@ -0,0 +1,190 @@ > > + Apache License > > + Version 2.0, January 2004 > > +http://www.apache.org/licenses/ > > > diff --git a/libstdc++-v3/src/c++17/fast_float/LICENSE-MIT > > b/libstdc++-v3/src/c++17/fast_float/LICENSE-MIT > > new file mode 100644 > > index 000..2fb2a37ad7f > > --- /dev/null > > +++ b/libstdc++-v3/src/c++17/fast_float/LICENSE-MIT > > @@ -0,0 +1,27 @@ > > +MIT License > > + > > +Copyright (c) 2021 The fast_float authors > > You also need to include the README file, which makes it clear that > recipients can choose between Apache and MIT. GCC needs to use the MIT > option, I think. I think we could use Apache as well, because this code isn't going to appear in public headers so the problematic clause doesn't apply. But MIT is simpler.
Re: [PATCH] libstdc++: Merge latest Ryu sources
On Tue, 16 Nov 2021 at 00:36, Patrick Palka wrote: > > The only source change is a speedup to pow5Factor. > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk? OK, thanks.
[PATCH] ipa-sra: Testcase that removing a "returns_nonnull" retval works
Hi, since we can now remove return values of functions with return_nonnull type attribute, I'll feel a bit safer if we can test this does not ICE when someone attempts to access a non-existent call LHS. Eventually we should probably drop the attribute when this happens. Tested on x86_64-linux, I will push it to master momentarily. Martin gcc/testsuite/ChangeLog: 2021-11-15 Martin Jambor * gcc.dg/ipa/ipa-sra-ret-nonull.c: New test. --- gcc/testsuite/gcc.dg/ipa/ipa-sra-ret-nonull.c | 40 +++ 1 file changed, 40 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-ret-nonull.c diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-sra-ret-nonull.c b/gcc/testsuite/gcc.dg/ipa/ipa-sra-ret-nonull.c new file mode 100644 index 000..18c13efd609 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/ipa-sra-ret-nonull.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-ipa-sra-details" } */ + +volatile void *gp; +volatile void *gq; +char buf[16]; + +__attribute__((returns_nonnull, noinline)) +static char * +foo (char *p, char *q) +{ + gq = q; + gp = p; + return q; +} + +__attribute__((returns_nonnull, noinline)) +static char * +bar (char *p, char *q) +{ + return foo (p, q) + 8; +} + +__attribute__((noipa)) +static char * +get_charp (void) +{ + return &buf[0]; +} + +int +main () +{ + char *r; + asm volatile ("" : : : "memory"); + r = bar (get_charp (), get_charp ()); + return 0; +} + +/* { dg-final { scan-ipa-dump-times "Will SKIP return." 2 "sra" } } */ -- 2.33.0
Re: [PATCH 1/5] libstdc++: Import the fast_float library
* Jonathan Wakely: > On Tue, 16 Nov 2021 at 08:01, Florian Weimer wrote: >> >> * Patrick Palka via Libstdc: >> >> > This copies the fast_float library[1] into the compiled-in library >> > sources. We're going to use this library in our floating-point >> > std::from_chars implementation for faster and more portable parsing of >> > binary32/64 decimal strings. >> > >> > [1]: https://github.com/fastfloat/fast_float >> > >> > Series tested on x86_64, i686, ppc64, ppc64le and aarch64, does it >> > look OK for trunk? >> >> Missing Signed-off-by:? > > That's not needed if Patrick is still covered by an FSF assignment. But the submission is not covered by the FSF assignment. > I think we could use Apache as well, because this code isn't going to > appear in public headers so the problematic clause doesn't apply. But > MIT is simpler. Okay, so you consider dynamic linking only? I think the historic libstdc++ license is more permissive than Apache or MIT when used with GCC. There aren't any notification or other requirements. Thanks, Florian
Re: [PATCH] Fix PR tree-optimization/103228 and 103228: folding of (type) X op CST where type is a nop convert
On Tue, Nov 16, 2021 at 4:36 AM apinski--- via Gcc-patches wrote: > > From: Andrew Pinski > > Currently we fold (type) X op CST into (type) (X op ((type-x) CST)) when the > conversion widens > but not when the conversion is a nop. For the same reason why we move the > widening conversion > (the possibility of removing an extra conversion), we should do the same if > the conversion is a > nop. > > OK? Boostrapped and tested on x86_64-linux-gnu with no regressions. > > PR tree-optimization/103228 > PR tree-optimization/55177 > > gcc/ChangeLog: > > * match.pd ((type) X bitop CST): Also do this > transformation for nop conversions. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/pr103228-1.c: New test. > * gcc.dg/tree-ssa/pr55177-1.c: New test. > --- > gcc/match.pd | 2 +- > gcc/testsuite/gcc.dg/tree-ssa/pr103228-1.c | 11 +++ > gcc/testsuite/gcc.dg/tree-ssa/pr55177-1.c | 14 ++ > 3 files changed, 26 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr103228-1.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr55177-1.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index a0e9a82e4c4..dc3d5054583 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -1615,7 +1615,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > && (bitop != BIT_AND_EXPR || GIMPLE) > && (/* That's a good idea if the conversion widens the operand, thus > after hoisting the conversion the operation will be narrower. > */ Can you please adjust the comment? OK with that change. > - TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (type) > + TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type) >/* It's also a good idea if the conversion is to a non-integer > mode. */ >|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr103228-1.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr103228-1.c > new file mode 100644 > index 000..a7539819cf2 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr103228-1.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > +int f(int a, int b) > +{ > + b|=1u; > + b|=2; > + return b; > +} > +/* { dg-final { scan-tree-dump-times "\\\| 3" 1 "optimized"} } */ > +/* { dg-final { scan-tree-dump-times "\\\| 1" 0 "optimized"} } */ > +/* { dg-final { scan-tree-dump-times "\\\| 2" 0 "optimized"} } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr55177-1.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr55177-1.c > new file mode 100644 > index 000..de1a264345c > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr55177-1.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > +extern int x; > + > +void foo(void) > +{ > + int a = __builtin_bswap32(x); > + a &= 0x5a5b5c5d; > + x = __builtin_bswap32(a); > +} > + > +/* { dg-final { scan-tree-dump-times "__builtin_bswap32" 0 "optimized"} } */ > +/* { dg-final { scan-tree-dump-times "& 1566333786" 1 "optimized"} } */ > +/* { dg-final { scan-tree-dump-times "& 1515936861" 0 "optimized"} } */ > -- > 2.17.1 >
[committed] arc: Update arc specific tests
Update assembly output test pattern. Take into consideration also for which platform we do execute the test (baremetal or linux). gcc/testsuite/ChangeLog: * gcc.target/arc/add_n-combine.c: Update test patterns. * gcc.target/arc/builtin_eh.c: Update test for linux platforms. * gcc.target/arc/mul64-1.c: Disable this test while running on linux. * gcc.target/arc/tls-gd.c: Update matching patterns. * gcc.target/arc/tls-ie.c: Likewise. * gcc.target/arc/tls-ld.c: Likewise. * gcc.target/arc/uncached-8.c: Likewise. Signed-off-by: Claudiu Zissulescu --- gcc/testsuite/gcc.target/arc/add_n-combine.c | 4 ++-- gcc/testsuite/gcc.target/arc/builtin_eh.c| 3 ++- gcc/testsuite/gcc.target/arc/mul64-1.c | 2 +- gcc/testsuite/gcc.target/arc/tls-gd.c| 4 ++-- gcc/testsuite/gcc.target/arc/tls-ie.c| 4 ++-- gcc/testsuite/gcc.target/arc/tls-ld.c| 6 +++--- gcc/testsuite/gcc.target/arc/uncached-8.c| 5 +++-- 7 files changed, 15 insertions(+), 13 deletions(-) diff --git a/gcc/testsuite/gcc.target/arc/add_n-combine.c b/gcc/testsuite/gcc.target/arc/add_n-combine.c index bc400df669e..84e261ece8f 100644 --- a/gcc/testsuite/gcc.target/arc/add_n-combine.c +++ b/gcc/testsuite/gcc.target/arc/add_n-combine.c @@ -45,6 +45,6 @@ void f() { a(at3.bn[bu]); } -/* { dg-final { scan-assembler "add1" } } */ -/* { dg-final { scan-assembler "add2" } } */ +/* { dg-final { scan-assembler "@at1\\+1" } } */ +/* { dg-final { scan-assembler "@at2\\+2" } } */ /* { dg-final { scan-assembler "add3" } } */ diff --git a/gcc/testsuite/gcc.target/arc/builtin_eh.c b/gcc/testsuite/gcc.target/arc/builtin_eh.c index 717a54bb084..83f4f1d2ee0 100644 --- a/gcc/testsuite/gcc.target/arc/builtin_eh.c +++ b/gcc/testsuite/gcc.target/arc/builtin_eh.c @@ -19,4 +19,5 @@ foo (int x) /* { dg-final { scan-assembler "r13" } } */ /* { dg-final { scan-assembler "r0" } } */ /* { dg-final { scan-assembler "fp" } } */ -/* { dg-final { scan-assembler "fp,64" } } */ +/* { dg-final { scan-assembler "fp,64" { target { *-elf32-* } } } } */ +/* { dg-final { scan-assembler "fp,60" { target { *-linux-* } } } } */ diff --git a/gcc/testsuite/gcc.target/arc/mul64-1.c b/gcc/testsuite/gcc.target/arc/mul64-1.c index 2543fc33d3f..1a351feee87 100644 --- a/gcc/testsuite/gcc.target/arc/mul64-1.c +++ b/gcc/testsuite/gcc.target/arc/mul64-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-skip-if "MUL64 is ARC600 extension." { ! { clmcpu } } } */ +/* { dg-skip-if "MUL64 is ARC600 extension." { { ! { clmcpu } } || *-linux-* } } */ /* { dg-options "-O2 -mmul64 -mbig-endian -mcpu=arc600" } */ /* Check if mlo/mhi registers are correctly layout when we compile for diff --git a/gcc/testsuite/gcc.target/arc/tls-gd.c b/gcc/testsuite/gcc.target/arc/tls-gd.c index aa1b5429b08..d02af9537f8 100644 --- a/gcc/testsuite/gcc.target/arc/tls-gd.c +++ b/gcc/testsuite/gcc.target/arc/tls-gd.c @@ -13,5 +13,5 @@ int *ae2 (void) return &e2; } -/* { dg-final { scan-assembler "add r0,pcl,@e2@tlsgd" } } */ -/* { dg-final { scan-assembler "bl @__tls_get_addr@plt" } } */ +/* { dg-final { scan-assembler "add\\s+r0,pcl,@e2@tlsgd" } } */ +/* { dg-final { scan-assembler "bl\\s+@__tls_get_addr@plt" } } */ diff --git a/gcc/testsuite/gcc.target/arc/tls-ie.c b/gcc/testsuite/gcc.target/arc/tls-ie.c index 0c981cfbf67..f4ad635c4d3 100644 --- a/gcc/testsuite/gcc.target/arc/tls-ie.c +++ b/gcc/testsuite/gcc.target/arc/tls-ie.c @@ -13,5 +13,5 @@ int *ae2 (void) return &e2; } -/* { dg-final { scan-assembler "ld r0,\\\[pcl,@e2@tlsie\\\]" } } */ -/* { dg-final { scan-assembler "add_s r0,r0,r25" } } */ +/* { dg-final { scan-assembler "ld\\s+r0,\\\[pcl,@e2@tlsie\\\]" } } */ +/* { dg-final { scan-assembler "add_s\\s+r0,r0,r25" } } */ diff --git a/gcc/testsuite/gcc.target/arc/tls-ld.c b/gcc/testsuite/gcc.target/arc/tls-ld.c index 351c3f02abd..68ab9bf809c 100644 --- a/gcc/testsuite/gcc.target/arc/tls-ld.c +++ b/gcc/testsuite/gcc.target/arc/tls-ld.c @@ -13,6 +13,6 @@ int *ae2 (void) return &e2; } -/* { dg-final { scan-assembler "add r0,pcl,@.tbss@tlsgd" } } */ -/* { dg-final { scan-assembler "bl @__tls_get_addr@plt" } } */ -/* { dg-final { scan-assembler "add_s r0,r0,@e2@dtpoff" } } */ +/* { dg-final { scan-assembler "add\\s+r0,pcl,@.tbss@tlsgd" } } */ +/* { dg-final { scan-assembler "bl\\s+@__tls_get_addr@plt" } } */ +/* { dg-final { scan-assembler "add_s\\s+r0,r0,@e2@dtpoff" } } */ diff --git a/gcc/testsuite/gcc.target/arc/uncached-8.c b/gcc/testsuite/gcc.target/arc/uncached-8.c index 060229b11df..b5ea2359a9a 100644 --- a/gcc/testsuite/gcc.target/arc/uncached-8.c +++ b/gcc/testsuite/gcc.target/arc/uncached-8.c @@ -29,5 +29,6 @@ void bar (void) x.c.b.a = 10; } -/* { dg-final { scan-assembler-times "st\.di" 1 } } */ -/* { dg-final { scan-assembler-times "st\.as\.di" 1 } } */ +/* { dg-final { scan-assembler-times "st\.di" 2 { target { *-linux-* } } } } */ +/* { dg-final { scan-assembler-times "st\.di" 1 {
Re: [GCC-11 PATCH] aarch64: enable Ampere-1 CPU (backport to GCC11)
Philipp Tomsich writes: > This adds support and a basic turning model for the Ampere Computing > "Ampere-1" CPU. > > The Ampere-1 implements the ARMv8.6 architecture in A64 mode and is > modelled as a 4-wide issue (as with all modern micro-architectures, > the chosen issue rate is a compromise between the maximum dispatch > rate and the maximum rate of uops issued to the scheduler). > > This adds the -mcpu=ampere1 command-line option and the relevant cost > information/tuning tables for the Ampere-1. > > gcc/ChangeLog: > > * config/aarch64/aarch64-cores.def (AARCH64_CORE): New Ampere-1 > core. > * config/aarch64/aarch64-tune.md: Regenerate. > * config/aarch64/aarch64-cost-tables.h: Add extra costs for > Ampere-1. > * config/aarch64/aarch64.c: Add tuning structures for Ampere-1. > > (cherry picked from 67b0d47e20e655c0dd53a76ea88aab60fafb2059) > > --- > This is a backport from master and only affects the AArch64 backend. > > OK for GCC-11? Yes, thanks. Richard. > > gcc/config/aarch64/aarch64-cores.def | 3 +- > gcc/config/aarch64/aarch64-cost-tables.h | 104 +++ > gcc/config/aarch64/aarch64-tune.md | 2 +- > gcc/config/aarch64/aarch64.c | 78 + > gcc/doc/invoke.texi | 2 +- > 5 files changed, 186 insertions(+), 3 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-cores.def > b/gcc/config/aarch64/aarch64-cores.def > index b2aa1670561..4643e0e2795 100644 > --- a/gcc/config/aarch64/aarch64-cores.def > +++ b/gcc/config/aarch64/aarch64-cores.def > @@ -68,7 +68,8 @@ AARCH64_CORE("octeontx83",octeontxt83, thunderx, 8A, > AARCH64_FL_FOR_ARCH > AARCH64_CORE("thunderxt81", thunderxt81, thunderx, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, > 0x0a2, -1) > AARCH64_CORE("thunderxt83", thunderxt83, thunderx, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, > 0x0a3, -1) > > -/* Ampere Computing cores. */ > +/* Ampere Computing ('\xC0') cores. */ > +AARCH64_CORE("ampere1", ampere1, cortexa57, 8_6A, AARCH64_FL_FOR_ARCH8_6, > ampere1, 0xC0, 0xac3, -1) > /* Do not swap around "emag" and "xgene1", > this order is required to handle variant correctly. */ > AARCH64_CORE("emag",emag, xgene1,8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, emag, 0x50, 0x000, 3) > diff --git a/gcc/config/aarch64/aarch64-cost-tables.h > b/gcc/config/aarch64/aarch64-cost-tables.h > index dd2e7e7cbb1..4b7e4e034a2 100644 > --- a/gcc/config/aarch64/aarch64-cost-tables.h > +++ b/gcc/config/aarch64/aarch64-cost-tables.h > @@ -650,4 +650,108 @@ const struct cpu_cost_table a64fx_extra_costs = >} > }; > > +const struct cpu_cost_table ampere1_extra_costs = > +{ > + /* ALU */ > + { > +0, /* arith. */ > +0, /* logical. */ > +0, /* shift. */ > +COSTS_N_INSNS (1), /* shift_reg. */ > +0, /* arith_shift. */ > +COSTS_N_INSNS (1), /* arith_shift_reg. */ > +0, /* log_shift. */ > +COSTS_N_INSNS (1), /* log_shift_reg. */ > +0, /* extend. */ > +COSTS_N_INSNS (1), /* extend_arith. */ > +0, /* bfi. */ > +0, /* bfx. */ > +0, /* clz. */ > +0, /* rev. */ > +0, /* non_exec. */ > +true /* non_exec_costs_exec. */ > + }, > + { > +/* MULT SImode */ > +{ > + COSTS_N_INSNS (3), /* simple. */ > + COSTS_N_INSNS (3), /* flag_setting. */ > + COSTS_N_INSNS (3), /* extend. */ > + COSTS_N_INSNS (4), /* add. */ > + COSTS_N_INSNS (4), /* extend_add. */ > + COSTS_N_INSNS (18) /* idiv. */ > +}, > +/* MULT DImode */ > +{ > + COSTS_N_INSNS (3), /* simple. */ > + 0, /* flag_setting (N/A). */ > + COSTS_N_INSNS (3), /* extend. */ > + COSTS_N_INSNS (4), /* add. */ > + COSTS_N_INSNS (4), /* extend_add. */ > + COSTS_N_INSNS (34) /* idiv. */ > +} > + }, > + /* LD/ST */ > + { > +COSTS_N_INSNS (4), /* load. */ > +COSTS_N_INSNS (4), /* load_sign_extend. */ > +0, /* ldrd (n/a). */ > +0, /* ldm_1st. */ > +0, /* ldm_regs_per_insn_1st. */ > +0, /* ldm_regs_per_insn_subsequent. */ > +COSTS_N_INSNS (5), /* loadf. */ > +COSTS_N_INSNS (5), /* loadd. */ > +COSTS_N_INSNS (5), /* load_unaligned. */ > +0, /* store. */ > +0, /* strd. */ > +0, /* stm_1st. */ > +0, /* stm_regs_per_insn_1st. */ > +
Re: [PATCH] tree-optimization: [PR103218] Fold ((type)(a<0)) << SIGNBITOFA into ((type)a) & signbit
On Sat, Nov 13, 2021 at 9:14 PM apinski--- via Gcc-patches wrote: > > From: Andrew Pinski > > This folds Fold ((type)(a<0)) << SIGNBITOFA into ((type)a) & signbit inside > match.pd. > This was already handled in fold-cost by: > /* A < 0 ? : 0 is simply (A & ). */ > I have not removed as we only simplify "a ? POW2 : 0" at the gimple level to > "a << CST1" > and fold actually does the reverse of folding "(a<0)< 1< OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. OK. Thanks, Richard. > PR tree-optimization/103218 > > gcc/ChangeLog: > > * match.pd: New pattern for "((type)(a<0)) << SIGNBITOFA". > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/pr103218-1.c: New test. > --- > gcc/match.pd | 10 > gcc/testsuite/gcc.dg/tree-ssa/pr103218-1.c | 28 ++ > 2 files changed, 38 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr103218-1.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index a319aefa808..df31964e02f 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -865,6 +865,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > { tree utype = unsigned_type_for (type); } > (convert (rshift (lshift (convert:utype @0) @2) @3)) > > +/* Fold ((type)(a<0)) << SIGNBITOFA into ((type)a) & signbit. */ > +(simplify > + (lshift (convert (lt @0 integer_zerop@1)) INTEGER_CST@2) > + (if (TYPE_SIGN (TREE_TYPE (@0)) == SIGNED > + && wi::eq_p (wi::to_wide (@2), TYPE_PRECISION (TREE_TYPE (@0)) - 1)) > + (with { wide_int wone = wi::one (TYPE_PRECISION (type)); } > + (bit_and (convert @0) > +{ wide_int_to_tree (type, > + wi::lshift (wone, wi::to_wide (@2))); } > + > /* Fold (-x >> C) into -(x > 0) where C = precision(type) - 1. */ > (for cst (INTEGER_CST VECTOR_CST) > (simplify > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr103218-1.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr103218-1.c > new file mode 100644 > index 000..f086f073b38 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr103218-1.c > @@ -0,0 +1,28 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > +/* PR tree-optimization/103218 */ > + > +/* These first two are removed during forwprop1 */ > +signed char f(signed char a) > +{ > + signed char t = a < 0; > + int tt = (unsigned char)(t << 7); > + return tt; > +} > +signed char f0(signed char a) > +{ > + unsigned char t = a < 0; > + int tt = (unsigned char)(t << 7); > + return tt; > +} > + > +/* This one is removed during phiopt. */ > +signed char f1(signed char a) > +{ > +if (a < 0) > + return 1u<<7; > +return 0; > +} > + > +/* These three examples should remove "a < 0" by optimized. */ > +/* { dg-final { scan-tree-dump-times "< 0" 0 "optimized"} } */ > -- > 2.17.1 >
[PATCH 0/2][GCC] arm: Define MVE types internally
Hi all, This patch series implements the arm MVE ACLE types currently found under config/arm/arm_mve_types.h internally via a new pragma. Exposing the MVE ACLE types internally allows for an MVE intrinsics implementation similar to the current SVE implementation. Any prefix of the patch series should build and pass regression tests. Thanks, Murray --- Murray Steele (2): arm: Move arm_simd_info array declaration into header arm: Define MVE types internally via pragma gcc/config.gcc| 2 +- gcc/config/arm/arm-builtins.c | 87 +--- gcc/config/arm/arm-builtins.h | 87 gcc/config/arm/arm-c.c| 21 ++ gcc/config/arm/arm-mve-builtins.cc| 192 ++ gcc/config/arm/arm-mve-builtins.def | 41 gcc/config/arm/arm-mve-builtins.h | 34 gcc/config/arm/arm-protos.h | 5 + gcc/config/arm/arm_mve_types.h| 30 +-- gcc/config/arm/t-arm | 10 + .../arm/mve/general-c/type_redef_1.c | 7 + .../arm/mve/general-c/type_redef_10.c | 7 + .../arm/mve/general-c/type_redef_11.c | 7 + .../arm/mve/general-c/type_redef_12.c | 7 + .../arm/mve/general-c/type_redef_13.c | 7 + .../arm/mve/general-c/type_redef_14.c | 7 + .../arm/mve/general-c/type_redef_15.c | 7 + .../arm/mve/general-c/type_redef_16.c | 7 + .../arm/mve/general-c/type_redef_17.c | 7 + .../arm/mve/general-c/type_redef_18.c | 7 + .../arm/mve/general-c/type_redef_19.c | 7 + .../arm/mve/general-c/type_redef_2.c | 7 + .../arm/mve/general-c/type_redef_20.c | 7 + .../arm/mve/general-c/type_redef_21.c | 7 + .../arm/mve/general-c/type_redef_22.c | 7 + .../arm/mve/general-c/type_redef_23.c | 7 + .../arm/mve/general-c/type_redef_24.c | 7 + .../arm/mve/general-c/type_redef_25.c | 7 + .../arm/mve/general-c/type_redef_26.c | 7 + .../arm/mve/general-c/type_redef_27.c | 7 + .../arm/mve/general-c/type_redef_28.c | 7 + .../arm/mve/general-c/type_redef_29.c | 7 + .../arm/mve/general-c/type_redef_3.c | 7 + .../arm/mve/general-c/type_redef_30.c | 7 + .../arm/mve/general-c/type_redef_31.c | 7 + .../arm/mve/general-c/type_redef_4.c | 7 + .../arm/mve/general-c/type_redef_5.c | 7 + .../arm/mve/general-c/type_redef_6.c | 7 + .../arm/mve/general-c/type_redef_7.c | 7 + .../arm/mve/general-c/type_redef_8.c | 7 + .../arm/mve/general-c/type_redef_9.c | 7 + .../arm/mve/general/double_pragmas_1.c| 8 + .../gcc.target/arm/mve/general/nomve_1.c | 3 + gcc/testsuite/gcc.target/arm/mve/mve.exp | 6 + 44 files changed, 627 insertions(+), 116 deletions(-) create mode 100644 gcc/config/arm/arm-mve-builtins.cc create mode 100644 gcc/config/arm/arm-mve-builtins.def create mode 100644 gcc/config/arm/arm-mve-builtins.h create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_1.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_10.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_11.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_12.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_13.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_14.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_15.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_16.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_17.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_18.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_19.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_2.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_20.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_21.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_22.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_23.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_24.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_25.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_26.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_27.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_28.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_29.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/type_redef_3.c create mode 100644 gcc
[PATCH 1/2][GCC] arm: Move arm_simd_info array declaration into header
Hi all, This patch moves the arm_simd_type and arm_type_qualifiers enums, and arm_simd_info struct from arm-builtins.c into arm-builtins.h header. This is a first step towards internalising the type definitions for MVE predicate, vector, and tuple types. By moving arm_simd_types into a header, we allow future patches to use these type trees externally to arm-builtins.c, which is a crucial step towards developing an MVE intrinsics framework similar to the current SVE implementation. Thanks, Murray gcc/ChangeLog: * config/arm/arm-builtins.c (enum arm_type_qualifiers): Move to arm_builtins.h (enum arm_simd_type): Move to arm-builtins.h (struct arm_simd_type_info): Move to arm-builtins.h * config/arm/arm-builtins.h (enum arm_simd_type): Move from arm-builtins.c (enum arm_type_qualifiers): Move from arm-builtins.c (struct arm_simd_type_info): Move from arm-builtins.c diff --git a/gcc/config/arm/arm-builtins.h b/gcc/config/arm/arm-builtins.h index bee9f9bb83758820ca7faedf80b7e138026c1ca0..a40fa8950707314d3cc1372fb5c47a8891a18516 100644 --- a/gcc/config/arm/arm-builtins.h +++ b/gcc/config/arm/arm-builtins.h @@ -32,4 +32,91 @@ enum resolver_ident { enum resolver_ident arm_describe_resolver (tree); unsigned arm_cde_end_args (tree); +#define ENTRY(E, M, Q, S, T, G) E, +enum arm_simd_type +{ +#include "arm-simd-builtin-types.def" + __TYPE_FINAL +}; +#undef ENTRY + +enum arm_type_qualifiers +{ + /* T foo. */ + qualifier_none = 0x0, + /* unsigned T foo. */ + qualifier_unsigned = 0x1, /* 1 << 0 */ + /* const T foo. */ + qualifier_const = 0x2, /* 1 << 1 */ + /* T *foo. */ + qualifier_pointer = 0x4, /* 1 << 2 */ + /* const T * foo. */ + qualifier_const_pointer = 0x6, + /* Used when expanding arguments if an operand could + be an immediate. */ + qualifier_immediate = 0x8, /* 1 << 3 */ + qualifier_unsigned_immediate = 0x9, + qualifier_maybe_immediate = 0x10, /* 1 << 4 */ + /* void foo (...). */ + qualifier_void = 0x20, /* 1 << 5 */ + /* Some patterns may have internal operands, this qualifier is an + instruction to the initialisation code to skip this operand. */ + qualifier_internal = 0x40, /* 1 << 6 */ + /* Some builtins should use the T_*mode* encoded in a simd_builtin_datum + rather than using the type of the operand. */ + qualifier_map_mode = 0x80, /* 1 << 7 */ + /* qualifier_pointer | qualifier_map_mode */ + qualifier_pointer_map_mode = 0x84, + /* qualifier_const_pointer | qualifier_map_mode */ + qualifier_const_pointer_map_mode = 0x86, + /* Polynomial types. */ + qualifier_poly = 0x100, + /* Lane indices - must be within range of previous argument = a vector. */ + qualifier_lane_index = 0x200, + /* Lane indices for single lane structure loads and stores. */ + qualifier_struct_load_store_lane_index = 0x400, + /* A void pointer. */ + qualifier_void_pointer = 0x800, + /* A const void pointer. */ + qualifier_const_void_pointer = 0x802, + /* Lane indices selected in pairs - must be within range of previous + argument = a vector. */ + qualifier_lane_pair_index = 0x1000, + /* Lane indices selected in quadtuplets - must be within range of previous + argument = a vector. */ + qualifier_lane_quadtup_index = 0x2000 +}; + +struct arm_simd_type_info +{ + enum arm_simd_type type; + + /* Internal type name. */ + const char *name; + + /* Internal type name(mangled). The mangled names conform to the + AAPCS (see "Procedure Call Standard for the ARM Architecture", + Appendix A). To qualify for emission with the mangled names defined in + that document, a vector type must not only be of the correct mode but also + be of the correct internal Neon vector type (e.g. __simd64_int8_t); + these types are registered by arm_init_simd_builtin_types (). In other + words, vector types defined in other ways e.g. via vector_size attribute + will get default mangled names. */ + const char *mangle; + + /* Internal type. */ + tree itype; + + /* Element type. */ + tree eltype; + + /* Machine mode the internal type maps to. */ + machine_mode mode; + + /* Qualifiers. */ + enum arm_type_qualifiers q; +}; + +extern struct arm_simd_type_info arm_simd_types[]; + #endif /* GCC_ARM_BUILTINS_H */ diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 3a9ff8f26b8e222c52cb70f7509b714c3e475758..b6bf31349d8f0e996a6c169b061ebe05a2cf9acb 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -48,53 +48,6 @@ #define SIMD_MAX_BUILTIN_ARGS 7 -enum arm_type_qualifiers -{ - /* T foo. */ - qualifier_none = 0x0, - /* unsigned T foo. */ - qualifier_unsigned = 0x1, /* 1 << 0 */ - /* const T foo. */ - qualifier_const = 0x2, /* 1 << 1 */ - /* T *foo. */ - qualifier_pointer = 0x4, /* 1 << 2 */ - /* const T * foo. */ - qualifier_const_pointer = 0x6, - /* Used when expanding arguments if an o
[PATCH 2/2][GCC] arm: Declare MVE types internally via pragma
Hi all, This patch moves the implementation of MVE ACLE types from arm_mve_types.h to inside GCC via a new pragma, which replaces the prior type definitions. This allows for the types to be used internally for intrinsic function definitions. Bootstrapped and regression tested on arm-none-linux-gnuabihf, and regression tested on arm-eabi -- no issues. Thanks, Murray gcc/ChangeLog: * config.gcc: Add arm-mve-builtins.o to extra_objs for arm-*-*-* targets. * config/arm/arm-c.c (arm_pragma_arm): Handle new pragma. (arm_register_target_pragmas): Register new pragma. * config/arm/arm-protos.h: Add arm_mve namespace and declare arm_handle_mve_types_h. * config/arm/arm_mve_types.h: Replace MVE type definitions with new pragma. * config/arm/t-arm: Add arm-mve-builtins.o target. * config/arm/arm-mve-builtins.cc: New file. * config/arm/arm-mve-builtins.def: New file. * config/arm/arm-mve-builtins.h: New file. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/mve.exp: Add new subdirectories. * gcc.target/arm/mve/general-c/type_redef_1.c: New test. * gcc.target/arm/mve/general-c/type_redef_10.c: New test. * gcc.target/arm/mve/general-c/type_redef_11.c: New test. * gcc.target/arm/mve/general-c/type_redef_12.c: New test. * gcc.target/arm/mve/general-c/type_redef_13.c: New test. * gcc.target/arm/mve/general-c/type_redef_14.c: New test. * gcc.target/arm/mve/general-c/type_redef_15.c: New test. * gcc.target/arm/mve/general-c/type_redef_16.c: New test. * gcc.target/arm/mve/general-c/type_redef_17.c: New test. * gcc.target/arm/mve/general-c/type_redef_18.c: New test. * gcc.target/arm/mve/general-c/type_redef_19.c: New test. * gcc.target/arm/mve/general-c/type_redef_2.c: New test. * gcc.target/arm/mve/general-c/type_redef_20.c: New test. * gcc.target/arm/mve/general-c/type_redef_21.c: New test. * gcc.target/arm/mve/general-c/type_redef_22.c: New test. * gcc.target/arm/mve/general-c/type_redef_23.c: New test. * gcc.target/arm/mve/general-c/type_redef_24.c: New test. * gcc.target/arm/mve/general-c/type_redef_25.c: New test. * gcc.target/arm/mve/general-c/type_redef_26.c: New test. * gcc.target/arm/mve/general-c/type_redef_27.c: New test. * gcc.target/arm/mve/general-c/type_redef_28.c: New test. * gcc.target/arm/mve/general-c/type_redef_29.c: New test. * gcc.target/arm/mve/general-c/type_redef_3.c: New test. * gcc.target/arm/mve/general-c/type_redef_30.c: New test. * gcc.target/arm/mve/general-c/type_redef_31.c: New test. * gcc.target/arm/mve/general-c/type_redef_4.c: New test. * gcc.target/arm/mve/general-c/type_redef_5.c: New test. * gcc.target/arm/mve/general-c/type_redef_6.c: New test. * gcc.target/arm/mve/general-c/type_redef_7.c: New test. * gcc.target/arm/mve/general-c/type_redef_8.c: New test. * gcc.target/arm/mve/general-c/type_redef_9.c: New test. * gcc.target/arm/mve/general/double_pragmas_1.c: New test. * gcc.target/arm/mve/general/nomve_1.c: New test. diff --git a/gcc/config.gcc b/gcc/config.gcc index 3675e063a5365ff84854eb5c2c27921216494c69..50d3401e3aa94f077d7e0675ee443a94431dba1e 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -352,7 +352,7 @@ arc*-*-*) ;; arm*-*-*) cpu_type=arm - extra_objs="arm-builtins.o aarch-common.o" + extra_objs="arm-builtins.o arm-mve-builtins.o aarch-common.o" extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h arm_mve_types.h arm_mve.h arm_cde.h" target_type_format_char='%' c_target_objs="arm-c.o" diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c index cc7901bca8dc9c5c27ed6afc5bc26afd42689e6d..d1414f6e0e1c2bd0a7364b837c16adf493221376 100644 --- a/gcc/config/arm/arm-c.c +++ b/gcc/config/arm/arm-c.c @@ -28,6 +28,7 @@ #include "c-family/c-pragma.h" #include "stringpool.h" #include "arm-builtins.h" +#include "arm-protos.h" tree arm_resolve_cde_builtin (location_t loc, tree fndecl, void *arglist) @@ -129,6 +130,24 @@ arm_resolve_cde_builtin (location_t loc, tree fndecl, void *arglist) return call_expr; } +/* Implement "#pragma GCC arm". */ +static void +arm_pragma_arm (cpp_reader *) +{ + tree x; + if (pragma_lex (&x) != CPP_STRING) +{ + error ("%<#pragma GCC arm%> requires a string parameter"); + return; +} + + const char *name = TREE_STRING_POINTER (x); + if (strcmp (name, "arm_mve_types.h") == 0) +arm_mve::handle_arm_mve_types_h (); + else +error ("unknown %<#pragma GCC arm%> option %qs", name); +} + /* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN. This is currently only used for the MVE related builtins for the CDE extension. Here we ensure the type of arguments is such
Re: [PATCH] PR tree-optimization/103216: optimize some A ? (b op CST) : b into b op (A?CST:CST2)
On Mon, Nov 15, 2021 at 1:09 AM apinski--- via Gcc-patches wrote: > > From: Andrew Pinski > > For this PR, we have: > if (d_5 < 0) > goto ; [INV] > else > goto ; [INV] > >: > v_7 = c_4 | -128; > >: > # v_1 = PHI > > Which PHI-OPT will try to simplify > "(d_5 < 0) ? (c_4 | -128) : c_4" which is not handled currently. > This adds a few patterns which allows to try to see if (a ? CST : CST1) > where CST1 is either 0, 1 or -1 depending on the operator. > Note to optimize this case always, we should check to make sure that > the a?CST:CST1 gets simplified to not include the conditional expression. > The ! flag does not work as we want to have more simplifcations than just > when we simplify it to a leaf node (SSA_NAME or CONSTANT). This adds a new > flag ^ to genmatch which says the simplification should happen but not down > to the same kind of node. > We could allow this for !GIMPLE and use fold_* rather than fold_buildN but I > didn't see any use of it for now. > > Also all of these patterns need to be done late as other optimizations can be > done without them. > > OK? Bootstrapped and tested on x86_64 with no regressions. > > gcc/ChangeLog: > > * doc/match-and-simplify.texi: Document ^ flag. > * genmatch.c (expr::expr): Add Setting of force_simplify. > (expr): Add force_simplify field. > (expr::gen_transform): Add support for force_simplify field. > (parser::parse_expr): Add parsing of ^ flag for the expr. > * match.pd: New patterns to optimize "a ? (b op CST) : b". > --- > gcc/doc/match-and-simplify.texi | 16 + > gcc/genmatch.c | 35 ++-- > gcc/match.pd| 41 + > 3 files changed, 90 insertions(+), 2 deletions(-) > > diff --git a/gcc/doc/match-and-simplify.texi b/gcc/doc/match-and-simplify.texi > index e7e5a4f7299..4e3407c0263 100644 > --- a/gcc/doc/match-and-simplify.texi > +++ b/gcc/doc/match-and-simplify.texi > @@ -377,6 +377,22 @@ of the @code{vec_cond} expression but only if the actual > plus > operations both simplify. Note this is currently only supported > for code generation targeting @code{GIMPLE}. > > +Another modifier for generated expressions is @code{^} which > +tells the machinery to only consider the simplification in case > +the marked expression simplified away from the original code. > +Consider for example > + > +@smallexample > +(simplify > + (cond @@0 (plus:s @@1 INTEGER_CST@@2) @@1) > + (plus @@1 (cond^ @@0 @@2 @{ build_zero_cst (type); @}))) > +@end smallexample > + > +which moves the inner @code{plus} operation to the outside of the > +@code{cond} expression but only if the actual cond operation simplify > +wayaway from cond. Note this is currently only supported for code s/wayaway/away/ > +generation targeting @code{GIMPLE}. > + > As intermediate conversions are often optional there is a way to > avoid the need to repeat patterns both with and without such > conversions. Namely you can mark a conversion as being optional > diff --git a/gcc/genmatch.c b/gcc/genmatch.c > index 95248455ec5..2dca1141df6 100644 > --- a/gcc/genmatch.c > +++ b/gcc/genmatch.c > @@ -698,12 +698,13 @@ public: > : operand (OP_EXPR, loc), operation (operation_), >ops (vNULL), expr_type (NULL), is_commutative (is_commutative_), >is_generic (false), force_single_use (false), force_leaf (false), > - opt_grp (0) {} > + force_simplify(false), opt_grp (0) {} >expr (expr *e) > : operand (OP_EXPR, e->location), operation (e->operation), >ops (vNULL), expr_type (e->expr_type), is_commutative > (e->is_commutative), >is_generic (e->is_generic), force_single_use (e->force_single_use), > - force_leaf (e->force_leaf), opt_grp (e->opt_grp) {} > + force_leaf (e->force_leaf), force_simplify(e->force_simplify), > + opt_grp (e->opt_grp) {} >void append_op (operand *op) { ops.safe_push (op); } >/* The operator and its operands. */ >id_base *operation; > @@ -721,6 +722,9 @@ public: >/* Whether in the result expression this should be a leaf node > with any children simplified down to simple operands. */ >bool force_leaf; > + /* Whether in the result expression this should be a node > + with any children simplified down not to use the original operator. */ > + bool force_simplify; >/* If non-zero, the group for optional handling. */ >unsigned char opt_grp; >virtual void gen_transform (FILE *f, int, const char *, bool, int, > @@ -2527,6 +2531,17 @@ expr::gen_transform (FILE *f, int indent, const char > *dest, bool gimple, > fprintf (f, ", _o%d[%u]", depth, i); >fprintf (f, ");\n"); >fprintf_indent (f, indent, "tem_op.resimplify (lseq, valueize);\n"); I wonder if with force_simplify we should pass NULL as lseq to resimplify? That is, should we allow (plus^ (convert @0) @1) to simplify to (convert (plus
RE: [vect-patterns] Refactor widen_plus/widen_minus as internal_fns
Updated patch 2 with explanation included in commit message and changes requested. Bootstrapped and regression tested on aarch64 > -Original Message- > From: Joel Hutton > Sent: 12 November 2021 11:42 > To: Richard Biener > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford > > Subject: RE: [vect-patterns] Refactor widen_plus/widen_minus as > internal_fns > > > please use #define INCLUDE_MAP before the system.h include instead. Done. > > Is it really necessary to build a new std::map for each optab lookup?! > > That looks quite ugly and inefficient. We'd usually - if necessary at > > all - build a auto_vec > and .sort () and .bsearch () > > it. > Ok, I'll rework this part. In the meantime, to address your other comment. Done. > > I'm not sure I understand DEF_INTERNAL_OPTAB_MULTI_FN, neither this > > cover letter nor the patch ChangeLog explains anything. > > I'll attempt to clarify, if this makes things clearer I can include this in > the > commit message of the respun patch: > > DEF_INTERNAL_OPTAB_MULTI_FN is like DEF_INTERNAL_OPTAB_FN except it > provides convenience wrappers for defining conversions that require a hi/lo > split, like widening and narrowing operations. Each definition for > will require an optab named and two other optabs that you specify > for signed and unsigned. The hi/lo pair is necessary because the widening > operations take n narrow elements as inputs and return n/2 wide elements > as outputs. The 'lo' operation operates on the first n/2 elements of input. > The 'hi' operation operates on the second n/2 elements of input. Defining an > internal_fn along with hi/lo variations allows a single internal function to > be > returned from a vect_recog function that will later be expanded to hi/lo. > > DEF_INTERNAL_OPTAB_MULTI_FN is used in internal-fn.def to register a > widening internal_fn. It is defined differently in different places and > internal- > fn.def is sourced from those places so the parameters given can be reused. > internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later > defined to generate the 'expand_' functions for the hi/lo versions of the fn. > internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original > and hi/lo variants of the internal_fn > > For example: > IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, > IFN_VEC_WIDEN_PLUS_LO > for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_addl_hi_ > -> (u/s)addl2 >IFN_VEC_WIDEN_PLUS_LO -> vec_widen_addl_lo_ > -> (u/s)addl > > This gives the same functionality as the previous > WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into > VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. > > Let me know if I'm not expressing this clearly. > > Thanks, > Joel 0001-vect-patterns-Refactor-to-allow-internal_fn-s.patch Description: 0001-vect-patterns-Refactor-to-allow-internal_fn-s.patch 0002-vect-patterns-Refactor-widen_plus-as-internal_fn.patch Description: 0002-vect-patterns-Refactor-widen_plus-as-internal_fn.patch 0003-Remove-widen_plus-minus_expr-tree-codes.patch Description: 0003-Remove-widen_plus-minus_expr-tree-codes.patch
[PATCH] tree-optimization/102880 - improve CD-DCE
The PR shows a missed control-dependent DCE caused by CFG cleanup merging a forwarder resulting in a partially degenerate PHI node. With control-dependent DCE we need to mark control dependences of incoming edges into PHIs as necessary but that is unnecessarily conservative for the case when two edges have the same value. There is no easy way to mark only a subset of control dependences of both edges necessary so the fix is to produce forwarder blocks where then the control dependence captures the requirements more precisely. For gcc.dg/tree-ssa/ssa-dom-thread-7.c the number of edges in the CFG decrease as we have commonized PHI arguments which in turn results in different threadings. The testcase is too complex and the dump scanning too simple to do anything meaningful here but to adjust the number of expected threads. The same CFG massaging could be useful at RTL expansion time to reduce the number of copies we need to insert on edges. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-11-12 Richard Biener PR tree-optimization/102880 * tree-ssa-dce.c (sort_phi_args): New function. (make_forwarders_with_degenerate_phis): Likewise. (perform_tree_ssa_dce): Call make_forwarders_with_degenerate_phis. * gcc.dg/tree-ssa/pr102880.c: New testcase. * gcc.dg/tree-ssa/pr69270-3.c: Robustify. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Change the number of expected threadings. --- gcc/testsuite/gcc.dg/tree-ssa/pr102880.c | 27 +++ gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c | 2 +- .../gcc.dg/tree-ssa/ssa-dom-thread-7.c| 2 +- gcc/tree-ssa-dce.c| 171 +- 4 files changed, 196 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr102880.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102880.c b/gcc/testsuite/gcc.dg/tree-ssa/pr102880.c new file mode 100644 index 000..0306deedb6c --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102880.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +void foo(void); + +static int b, c, d, e, f, ah; +static short g, ai, am, aq, as; +static char an, at, av, ax, ay; +static char a(char h, char i) { return i == 0 || h && i == 1 ? 0 : h % i; } +static void ae(int h) { + if (a(b, h)) +foo(); + +} +int main() { + ae(1); + ay = a(0, ay); + ax = a(g, aq); + at = a(0, as); + av = a(c, 1); + an = a(am, f); + int al = e || ((a(1, ah) && b) & d) == 2; + ai = al; +} + +/* We should eliminate the call to foo. */ +/* { dg-final { scan-tree-dump-not "foo" "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c index 89735f67de2..5ffd5f71506 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr69270-3.c @@ -3,7 +3,7 @@ /* We're looking for a constant argument a PHI node. There should only be one if we unpropagate correctly. */ -/* { dg-final { scan-tree-dump-times ", 1" 1 "uncprop1"} } */ +/* { dg-final { scan-tree-dump-times "<1\|, 1" 1 "uncprop1"} } */ typedef long unsigned int size_t; typedef union gimple_statement_d *gimple; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index d40a61fd725..b64e71dae22 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -11,7 +11,7 @@ to change decisions in switch expansion which in turn can expose new jump threading opportunities. Skip the later tests on aarch64. */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom3" { target { ! aarch64*-*-* } } } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 11" "thread2" { target { ! aarch64*-*-* } } } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 7" "thread2" { target { ! aarch64*-*-* } } } } */ /* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread2" { target { aarch64*-*-* } } } } */ enum STATE { diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c index 1281e67489c..dbf02c434de 100644 --- a/gcc/tree-ssa-dce.c +++ b/gcc/tree-ssa-dce.c @@ -67,6 +67,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "tree-ssa-propagate.h" #include "gimple-fold.h" +#include "tree-ssa.h" static struct stmt_stats { @@ -1612,6 +1613,164 @@ tree_dce_done (bool aggressive) worklist.release (); } +/* Sort PHI argument values for make_forwarders_with_degenerate_phis. */ + +static int +sort_phi_args (const void *a_, const void *b_) +{ + auto *a = (const std::pair *) a_; + auto *b = (const std::pair *) b_; + hashval_t ha = a->second; + hashval_t hb = b->second; + if (ha < hb) +return -1; + else if (ha > hb) +return 1; + else +return 0; +} + +/* Look for a non-virtual PHIs and make a forwarder block when all PHIs + have the sam
[committed] arc: Update (u)maddhisi4 patterns
The (u)maddsihi4 patterns are using the ARC's VMAC2H(U) instruction with null destination, however, VMAC2H(U) doesn't rewrite the accumulator. This patch solves the destination issue of VMAC2H by replacing it with DMACH(U) instruction. gcc/ * config/arc/arc.md (maddhisi4): Use a single move to accumulator. (umaddhisi4): Likewise. (machi): Update pattern. (umachi): Likewise. gcc/testsuite/ * gcc.target/arc/tmac-4.c: New test. Signed-off-by: Claudiu Zissulescu --- gcc/config/arc/arc.md | 34 +-- gcc/testsuite/gcc.target/arc/tmac-4.c | 29 +++ 2 files changed, 46 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arc/tmac-4.c diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md index 4919d275820..74ec38f1526 100644 --- a/gcc/config/arc/arc.md +++ b/gcc/config/arc/arc.md @@ -6023,26 +6023,26 @@ (define_insn "stack_irq_dwarf" (define_expand "maddhisi4" [(match_operand:SI 0 "register_operand" "") (match_operand:HI 1 "register_operand" "") - (match_operand:HI 2 "extend_operand" "") + (match_operand:HI 2 "register_operand" "") (match_operand:SI 3 "register_operand" "")] "TARGET_PLUS_MACD" "{ - rtx acc_reg = gen_rtx_REG (SImode, ACC_REG_FIRST); + rtx acc_reg = gen_rtx_REG (SImode, ACCL_REGNO); emit_move_insn (acc_reg, operands[3]); - emit_insn (gen_machi (operands[1], operands[2])); - emit_move_insn (operands[0], acc_reg); + emit_insn (gen_machi (operands[0], operands[1], operands[2], acc_reg)); DONE; }") (define_insn "machi" - [(set (reg:SI ARCV2_ACC) + [(set (match_operand:SI 0 "register_operand" "=Ral,r") (plus:SI -(mult:SI (sign_extend:SI (match_operand:HI 0 "register_operand" "%r")) - (sign_extend:SI (match_operand:HI 1 "register_operand" "r"))) -(reg:SI ARCV2_ACC)))] +(mult:SI (sign_extend:SI (match_operand:HI 1 "register_operand" "%r,r")) + (sign_extend:SI (match_operand:HI 2 "register_operand" "r,r"))) +(match_operand:SI 3 "accl_operand" ""))) + (clobber (reg:DI ARCV2_ACC))] "TARGET_PLUS_MACD" - "vmac2h\\t0,%0,%1" + "dmach\\t%0,%1,%2" [(set_attr "length" "4") (set_attr "type" "multi") (set_attr "predicable" "no") @@ -6056,22 +6056,22 @@ (define_expand "umaddhisi4" (match_operand:SI 3 "register_operand" "")] "TARGET_PLUS_MACD" "{ - rtx acc_reg = gen_rtx_REG (SImode, ACC_REG_FIRST); + rtx acc_reg = gen_rtx_REG (SImode, ACCL_REGNO); emit_move_insn (acc_reg, operands[3]); - emit_insn (gen_umachi (operands[1], operands[2])); - emit_move_insn (operands[0], acc_reg); + emit_insn (gen_umachi (operands[0], operands[1], operands[2], acc_reg)); DONE; }") (define_insn "umachi" - [(set (reg:SI ARCV2_ACC) + [(set (match_operand:SI 0 "register_operand" "=Ral,r") (plus:SI -(mult:SI (zero_extend:SI (match_operand:HI 0 "register_operand" "%r")) - (zero_extend:SI (match_operand:HI 1 "register_operand" "r"))) -(reg:SI ARCV2_ACC)))] +(mult:SI (zero_extend:SI (match_operand:HI 1 "register_operand" "%r,r")) + (zero_extend:SI (match_operand:HI 2 "register_operand" "r,r"))) +(match_operand:SI 3 "accl_operand" ""))) + (clobber (reg:DI ARCV2_ACC))] "TARGET_PLUS_MACD" - "vmac2hu\\t0,%0,%1" + "dmachu\\t%0,%1,%2" [(set_attr "length" "4") (set_attr "type" "multi") (set_attr "predicable" "no") diff --git a/gcc/testsuite/gcc.target/arc/tmac-4.c b/gcc/testsuite/gcc.target/arc/tmac-4.c new file mode 100644 index 000..3c6b99327a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/arc/tmac-4.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { ! { clmcpu } } } */ +/* { dg-options "-O3 -mbig-endian -mcpu=hs38" } */ + +struct a {}; +struct b { + int c; + int d; +}; + +struct { + struct a e; + struct b f[]; +} g; +short h; + +extern void bar (int *); + +int foo(void) +{ + struct b *a; + for (;;) +{ + a = &g.f[h]; + bar(&a->d); +} +} + +/* { dg-final { scan-assembler "dmach" } } */ -- 2.31.1
POS Customers Database
Hi, I was in your website, and I got to know that you are one of the Point Of sales (POS) company. We can help you in providing customers/users and competitors' business contacts across USA and worldwide which includes entire business details that you would require. Technology Product we track Number of Users Square POS 14028 Aloha POS 4852 Lightspeed Retail 1247 Maropost 1547 Total: 21674 If this sounds of any value, please specify your requirement in detail so that I can get back to you with more information and few samples just for your review. I look forward to hearing from you soon. Regards, Taylor Germain |Business Manager - Partnership Development As this is not an auto generated email, to discontinue receiving email from us reply as "Exclude"
Re: [PATCH 1/5] libstdc++: Import the fast_float library
On Tue, 16 Nov 2021 at 09:46, Florian Weimer via Libstdc++ < libstd...@gcc.gnu.org> wrote: > * Jonathan Wakely: > > > On Tue, 16 Nov 2021 at 08:01, Florian Weimer wrote: > >> > >> * Patrick Palka via Libstdc: > >> > >> > This copies the fast_float library[1] into the compiled-in library > >> > sources. We're going to use this library in our floating-point > >> > std::from_chars implementation for faster and more portable parsing of > >> > binary32/64 decimal strings. > >> > > >> > [1]: https://github.com/fastfloat/fast_float > >> > > >> > Series tested on x86_64, i686, ppc64, ppc64le and aarch64, does it > >> > look OK for trunk? > >> > >> Missing Signed-off-by:? > > > > That's not needed if Patrick is still covered by an FSF assignment. > > But the submission is not covered by the FSF assignment. > Good point. > > I think we could use Apache as well, because this code isn't going to > > appear in public headers so the problematic clause doesn't apply. But > > MIT is simpler. > > Okay, so you consider dynamic linking only? I think the historic > libstdc++ license is more permissive than Apache or MIT when used with > GCC. There aren't any notification or other requirements. > > Another good point - the Apache license is (once again) problematic here. So it's good we can choose the MIT one.
[PATCH] regrename: Skip renaming if instruction is noop move.
Skip renaming if instruction is noop move, and it will been removed for performance. gcc/ * regrename.c (find_rename_reg): Return satisfied regno if instruction is noop move. --- gcc/regrename.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/regrename.c b/gcc/regrename.c index b8a9ca36f22..cb605f5176b 100644 --- a/gcc/regrename.c +++ b/gcc/regrename.c @@ -394,6 +394,9 @@ find_rename_reg (du_head_p this_head, enum reg_class super_class, this_head, *unavailable)) return this_head->tied_chain->regno; + if (noop_move_p (this_head->first->insn)) +return best_new_reg; + /* If PREFERRED_CLASS is not NO_REGS, we iterate in the first pass over registers that belong to PREFERRED_CLASS and try to find the best register within the class. If that failed, we iterate in -- 2.24.3 (Apple Git-128)
Re: [PATCH][GCC] arm: add armv9-a architecture to -march
You can't make an omelette without breaking eggs, as they say. New architectures need new assemblers. However, I wonder if there's anything in v9-a that significantly affects the quality of the base multilib code needed for building the libraries. It might be that we can deal with v9-a by just mapping it to the v8-a equivalents. That would then avoid the need for an updated assembler, and reduce the build time and install footprint. R. On 16/11/2021 08:03, Christophe Lyon via Gcc-patches wrote: Hi, On Tue, Nov 9, 2021 at 12:36 PM Przemyslaw Wirkus via Gcc-patches < gcc-patches@gcc.gnu.org> wrote: -Original Message- From: Przemyslaw Wirkus Sent: 18 October 2021 10:37 To: gcc-patches@gcc.gnu.org Cc: Richard Earnshaw ; Ramana Radhakrishnan ; Kyrylo Tkachov ; ni...@redhat.com Subject: [PATCH][GCC] arm: add armv9-a architecture to -march Hi, This patch is adding `armv9-a` to -march in Arm GCC. In this patch: + Add `armv9-a` to -march. + Update multilib with armv9-a and armv9-a+simd. After this patch three additional multilib directories are available: $ arm-none-eabi-gcc --print-multi-lib .; [...vanilla multi-lib dirs...] thumb/v9-a/nofp;@mthumb@march=armv9-a@mfloat-abi=soft thumb/v9-a+simd/softfp;@mthumb@march=armv9-a+simd@mfloat- abi=softfp thumb/v9-a+simd/hard;@mthumb@march=armv9-a+simd@mfloat- abi=hard This is causing a GCC build failure when using "old" binutils (I'm using 2.36.1), because the new -march=armv9-a option is not supported. This breaks the multilib support. I don't remember how we handled similar cases in the past? Is that just "expected", and "current" GCC needs "current" binutils, or should we have a multilib list dependent on the actual binutils support? (I think this is not the case, and it sounds like an undesirable extra complication in an already overcrowded mutilib-Makefile) Christophe New multi-lib directories under $GCC_INSTALL_DIE/lib/gcc/arm-none-eabi/12.0.0/thumb are created: thumb/ +--- v9-a ||--- nofp | +--- v9-a+simd |--- hard |--- softfp Regtested on arm-none-eabi cross and no issues. OK for master? Thanks. commit 32ba7860ccaddd5219e6dae94a3d0653e124c9dd Ok. Thanks, Kyrill gcc/ChangeLog: * config/arm/arm-cpus.in (armv9): New define. (ARMv9a): New group. (armv9-a): New arch definition. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm.h (BASE_ARCH_9A): New arch enum value. * config/arm/t-aprofile: Added armv9-a and armv9+simd. * config/arm/t-arm-elf: Added arm9-a, v9_fps and all_v9_archs to MULTILIB_MATCHES. * config/arm/t-multilib: Added v9_a_nosimd_variants and v9_a_simd_variants to MULTILIB_MATCHES. * doc/invoke.texi: Update docs. gcc/testsuite/ChangeLog: * gcc.target/arm/multilib.exp: Update test with armv9-a entries. * lib/target-supports.exp (v9a): Add new armflag. (__ARM_ARCH_9A__): Add new armdef. -- kind regards, Przemyslaw Wirkus
[PATCH] OpenMP: Ensure that offloaded variables are public
Hi, This patch is needed for AMD GCN offloading when we use the assembler from LLVM 13+. The GCN runtime (libgomp+ROCm) requires that the location of all variables in the offloaded variables table are discoverable at runtime (using the "hsa_executable_symbol_get_info" API), and this only works when the symbols are exported from the binary. Previously we solved this by having mkoffload insert ".global" directives into the assembler text, but newer LLVM assemblers emit an error if we do this when then variable was previously declared ".local" (which happens when a variable is zero-initialized and placed in the BSS). Since we can no longer easily fix them up after the fact, this patch fixes them up during OMP lowering. OK? AndrewOpenMP: Ensure that offloaded variables are public The AMD GCN runtime loader requires that variables in the offload table are exported (public) so that it can locate the load address and do the mapping. gcc/ChangeLog: * config/gcn/mkoffload.c (process_asm): Don't add .global directives. * omp-offload.c (pass_omp_target_link::execute): Make offload_vars public. diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c index b2e71ea5aa00..5b130cc6de71 100644 --- a/gcc/config/gcn/mkoffload.c +++ b/gcc/config/gcn/mkoffload.c @@ -573,10 +573,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile) abort (); obstack_int_grow (&varsizes_os, varsize); var_count++; - - /* The HSA Runtime cannot locate the symbol if it is not - exported from the kernel. */ - fprintf (out, "\t.global %s\n", varname); } break; } diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 833f7ddea58f..c6fb87a5dee2 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -2799,6 +2799,18 @@ pass_omp_target_link::execute (function *fun) } } + /* Variables in the offload table may need to be public for the runtime + loader to be able to locate them. (This is true for at least amdgcn.) */ + if (offload_vars) +for (auto it = offload_vars->begin (); it != offload_vars->end (); it++) +if (!TREE_PUBLIC (*it)) + { + TREE_PUBLIC (*it) = 1; + + if (dump_enabled_p () && dump_flags & TDF_DETAILS) + dump_printf (MSG_NOTE, "Make offload var public: %T\n", *it); + } + return 0; }
[PATCH] middle-end/103248 - fix RDIV_EXPR handling with fixed point
This fixes the previous adjustment to operation_could_trap_helper_p where I failed to realize that RDIV_EXPR is also used for fixed-point types. It also fixes that handling by properly checking for a fixed_zerop divisor. Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? Thanks, Richard. 2021-11-16 Richard Biener PR middle-end/103248 * tree-eh.c (operation_could_trap_helper_p): Properly handle fixed-point RDIV_EXPR. * gcc.dg/pr103248.c: New testcase. --- gcc/testsuite/gcc.dg/pr103248.c | 8 gcc/tree-eh.c | 12 +--- 2 files changed, 17 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr103248.c diff --git a/gcc/testsuite/gcc.dg/pr103248.c b/gcc/testsuite/gcc.dg/pr103248.c new file mode 100644 index 000..da6232d21ee --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr103248.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target fixed_point } */ +/* { dg-options "-fnon-call-exceptions" } */ + +_Accum sa; +int c; + +void div_csa() { c /= sa; } diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c index 3eff07fc8fe..916da85af2e 100644 --- a/gcc/tree-eh.c +++ b/gcc/tree-eh.c @@ -2474,10 +2474,16 @@ operation_could_trap_helper_p (enum tree_code op, return false; case RDIV_EXPR: - if (honor_snans) + if (fp_operation) + { + if (honor_snans) + return true; + return flag_trapping_math; + } + /* Fixed point operations also use RDIV_EXPR. */ + if (!TREE_CONSTANT (divisor) || fixed_zerop (divisor)) return true; - gcc_assert (fp_operation); - return flag_trapping_math; + return false; case LT_EXPR: case LE_EXPR: -- 2.31.1
Re: Use modref kills in tree-ssa-dse
On Mon, 15 Nov 2021, Jan Hubicka wrote: > Hi, > this patch extends tree-ssa-dse to use modref kill summary to clear > live_bytes. This makes it possible to remove calls that are killed > in parts. > > I noticed that DSE duplicates the logic of tree-ssa-alias that is > mathing bases of memory accesses. Here operands_equal_p (base1, base, > OEP_ADDRESS_OF) is used. So it won't work with mismatching memref > offsets. We probably want to commonize this and add common function > that matches bases and returns offset adjustments. I wonder however if > it can catch any cases that the tree-ssa-alias code doesn't? Not sure, tree-ssa-dse.c doesn't seem to handle MEM_REF with offset? VN has adjust_offsets_for_equal_base_address for this purpose. I agree that some common functionality like bool get_relative_extent_of (const ao_ref *base, const ao_ref *ref, poly_int64 *offset); that computes [offset, offset + ref->[max_]size] of REF adjusted as to make ao_ref_base have the same address (or return false if not possible). Then [ base->offset, base->offset + base->max_size ] can be compared against that. > Other check that stmt_kills_ref_p has and tree-ssa-dse is for > non-call-exceptions. > > Bootstrapped/regtested x86_64-linux, OK? See below. > gcc/ChangeLog: > > * ipa-modref.c (get_modref_function_summary): New function. > * ipa-modref.h (get_modref_function_summary): Declare. > * tree-ssa-dse.c (clear_live_bytes_for_ref): Break out from ... > (clear_bytes_written_by): ... here; add handling of modref summary. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/modref-dse-4.c: New test. > > diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c > index df4612bbff9..8966f9fd2a4 100644 > --- a/gcc/ipa-modref.c > +++ b/gcc/ipa-modref.c > @@ -724,6 +724,22 @@ get_modref_function_summary (cgraph_node *func) >return r; > } > > +/* Get function summary for CALL if it exists, return NULL otherwise. > + If INTERPOSED is non-NULL set it to true if call may be interposed. */ > + > +modref_summary * > +get_modref_function_summary (gcall *call, bool *interposed) > +{ > + tree callee = gimple_call_fndecl (call); > + if (!callee) > +return NULL; > + struct cgraph_node *node = cgraph_node::get (callee); > + if (!node) > +return NULL; > + if (interposed) > +*interposed = !node->binds_to_current_def_p (); > + return get_modref_function_summary (node); > +} > + > namespace { > > /* Construct modref_access_node from REF. */ > diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h > index 9e8a30fd80a..72e608864ce 100644 > --- a/gcc/ipa-modref.h > +++ b/gcc/ipa-modref.h > @@ -50,6 +50,7 @@ struct GTY(()) modref_summary > }; > > modref_summary *get_modref_function_summary (cgraph_node *func); > +modref_summary *get_modref_function_summary (gcall *call, bool *interposed); > void ipa_modref_c_finalize (); > void ipa_merge_modref_summary_after_inlining (cgraph_edge *e); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-4.c > b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-4.c > new file mode 100644 > index 000..81aa7dc587c > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-4.c > @@ -0,0 +1,26 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-dse2-details" } */ > +struct a {int a,b,c;}; > +__attribute__ ((noinline)) > +void > +kill_me (struct a *a) > +{ > + a->a=0; > + a->b=0; > + a->c=0; > +} > +__attribute__ ((noinline)) > +void > +my_pleasure (struct a *a) > +{ > + a->a=1; > + a->c=2; > +} > +void > +set (struct a *a) > +{ > + kill_me (a); > + my_pleasure (a); > + a->b=1; > +} > +/* { dg-final { scan-tree-dump "Deleted dead store: kill_me" "dse2" } } */ > diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c > index ce0083a6dab..d2f54b0faad 100644 > --- a/gcc/tree-ssa-dse.c > +++ b/gcc/tree-ssa-dse.c > @@ -209,6 +209,24 @@ normalize_ref (ao_ref *copy, ao_ref *ref) >return true; > } > > +/* Update LIVE_BYTES tracking REF for write to WRITE: > + Verify we have the same base memory address, the write > + has a known size and overlaps with REF. */ > +static void > +clear_live_bytes_for_ref (sbitmap live_bytes, ao_ref *ref, ao_ref *write) > +{ > + HOST_WIDE_INT start, size; > + > + if (valid_ao_ref_for_dse (write) > + && operand_equal_p (write->base, ref->base, OEP_ADDRESS_OF) > + && known_eq (write->size, write->max_size) > + && normalize_ref (write, ref) normalize_ref alters 'write', I think we should work on a local copy here. See live_bytes_read which takes a copy of 'use_ref'. Otherwise looks good to me. Thanks, Richard. > + && (write->offset - ref->offset).is_constant (&start) > + && write->size.is_constant (&size)) > +bitmap_clear_range (live_bytes, start / BITS_PER_UNIT, > + size / BITS_PER_UNIT); > +} > + > /* Clear any bytes written by STMT from the bitmap LIVE_BYTES. The base > address written by STMT must mat
Re: [AArch64] Enable generation of FRINTNZ instructions
On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote: > > On 12/11/2021 10:56, Richard Biener wrote: > > On Thu, 11 Nov 2021, Andre Vieira (lists) wrote: > > > >> Hi, > >> > >> This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding > >> optabs and mappings. It also creates a backend pattern to implement them > >> for > >> aarch64 and a match.pd pattern to idiom recognize these. > >> These IFN's (and optabs) represent a truncation towards zero, as if > >> performed > >> by first casting it to a signed integer of 32 or 64 bits and then back to > >> the > >> same floating point type/mode. > >> > >> The match.pd pattern choses to use these, when supported, regardless of > >> trapping math, since these new patterns mimic the original behavior of > >> truncating through an integer. > >> > >> I didn't think any of the existing IFN's represented these. I know it's a > >> bit > >> late in stage 1, but I thought this might be OK given it's only used by a > >> single target and should have very little impact on anything else. > >> > >> Bootstrapped on aarch64-none-linux. > >> > >> OK for trunk? > > On the RTL side ftrunc32/ftrunc64 would probably be better a conversion > > optab (with two modes), so not > > > > +OPTAB_D (ftrunc32_optab, "ftrunc$asi2") > > +OPTAB_D (ftrunc64_optab, "ftrunc$adi2") > > > > but > > > > OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2") > > > > or so? I know that gets somewhat awkward for the internal function, > > but IMHO we shouldn't tie our hands because of that? > I tried doing this originally, but indeed I couldn't find a way to correctly > tie the internal function to it. > > direct_optab_supported_p with multiple types expect those to be of the same > mode. I see convert_optab_supported_p does but I don't know how that is > used... > > Any ideas? No "nice" ones. The "usual" way is to provide fake arguments that specify the type/mode. We could use an integer argument directly secifying the mode (then the IL would look host dependent - ugh), or specify a constant zero in the intended mode (less visibly obvious - but at least with -gimple dumping you'd see the type...). In any case if people think going with two optabs is OK then please consider using ftruncsi and ftruncdi instead of 32/64. Richard.
Re: [PATCH] regrename: Skip renaming if instruction is noop move.
On Tue, Nov 16, 2021 at 12:45 PM Jojo R via Gcc-patches wrote: > > Skip renaming if instruction is noop move, and it will > been removed for performance. Is there any (target specific) testcase you can add? Such commits are problematic when later bisected to since the intent isn't clear. > gcc/ > * regrename.c (find_rename_reg): Return satisfied regno > if instruction is noop move. > --- > gcc/regrename.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/gcc/regrename.c b/gcc/regrename.c > index b8a9ca36f22..cb605f5176b 100644 > --- a/gcc/regrename.c > +++ b/gcc/regrename.c > @@ -394,6 +394,9 @@ find_rename_reg (du_head_p this_head, enum reg_class > super_class, > this_head, *unavailable)) > return this_head->tied_chain->regno; > > + if (noop_move_p (this_head->first->insn)) > +return best_new_reg; > + >/* If PREFERRED_CLASS is not NO_REGS, we iterate in the first pass > over registers that belong to PREFERRED_CLASS and try to find the > best register within the class. If that failed, we iterate in > -- > 2.24.3 (Apple Git-128) >
Re: [PATCH 5/5] vect: Support masked gather loads with SLP
On Fri, Nov 12, 2021 at 7:06 PM Richard Sandiford via Gcc-patches wrote: > > This patch extends the previous SLP gather load support so > that it can handle masked loads too. > > Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? OK. Thanks, Richard. > Richard > > > gcc/ > * tree-vect-slp.c (arg1_arg4_map): New variable. > (vect_get_operand_map): Handle IFN_MASK_GATHER_LOAD. > (vect_build_slp_tree_1): Likewise. > (vect_build_slp_tree_2): Likewise. > * tree-vect-stmts.c (vectorizable_load): Expect the mask to be > the last SLP child node rather than the first. > > gcc/testsuite/ > * gcc.dg/vect/vect-gather-3.c: New test. > * gcc.dg/vect/vect-gather-4.c: Likewise. > * gcc.target/aarch64/sve/mask_gather_load_8.c: Likewise. > --- > gcc/testsuite/gcc.dg/vect/vect-gather-3.c | 64 ++ > gcc/testsuite/gcc.dg/vect/vect-gather-4.c | 48 ++ > .../aarch64/sve/mask_gather_load_8.c | 65 +++ > gcc/tree-vect-slp.c | 15 - > gcc/tree-vect-stmts.c | 21 -- > 5 files changed, 203 insertions(+), 10 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-3.c > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-4.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_8.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-3.c > b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c > new file mode 100644 > index 000..738bd3f3106 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c > @@ -0,0 +1,64 @@ > +#include "tree-vect.h" > + > +#define N 16 > + > +void __attribute__((noipa)) > +f (int *restrict y, int *restrict x, int *restrict indices) > +{ > + for (int i = 0; i < N; ++i) > +{ > + y[i * 2] = (indices[i * 2] < N * 2 > + ? x[indices[i * 2]] + 1 > + : 1); > + y[i * 2 + 1] = (indices[i * 2 + 1] < N * 2 > + ? x[indices[i * 2 + 1]] + 2 > + : 2); > +} > +} > + > +int y[N * 2]; > +int x[N * 2] = { > + 72704, 52152, 51301, 96681, > + 57937, 60490, 34504, 60944, > + 42225, 28333, 88336, 74300, > + 29250, 20484, 38852, 91536, > + 86917, 63941, 31590, 21998, > + 22419, 26974, 28668, 13968, > + 3451, 20247, 44089, 85521, > + 22871, 87362, 50555, 85939 > +}; > +int indices[N * 2] = { > + 15, 0x1, 0xcafe0, 19, > + 7, 22, 19, 1, > + 0x2, 0x7, 15, 30, > + 5, 12, 11, 11, > + 10, 25, 5, 20, > + 22, 24, 32, 28, > + 30, 19, 6, 0xabcdef, > + 7, 12, 8, 21 > +}; > +int expected[N * 2] = { > + 91537, 2, 1, 22000, > + 60945, 28670, 21999, 52154, > + 1, 2, 91537, 50557, > + 60491, 29252, 74301, 74302, > + 88337, 20249, 60491, 22421, > + 28669, 3453, 1, 22873, > + 50556, 22000, 34505, 2, > + 60945, 29252, 42226, 26976 > +}; > + > +int > +main (void) > +{ > + check_vect (); > + > + f (y, x, indices); > + for (int i = 0; i < 32; ++i) > +if (y[i] != expected[i]) > + __builtin_abort (); > + > + return 0; > +} > + > +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target > { vect_gather_load_ifn && vect_masked_load } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-4.c > b/gcc/testsuite/gcc.dg/vect/vect-gather-4.c > new file mode 100644 > index 000..ee2e4e4999a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-4.c > @@ -0,0 +1,48 @@ > +/* { dg-do compile } */ > + > +#define N 16 > + > +void > +f1 (int *restrict y, int *restrict x1, int *restrict x2, > +int *restrict indices) > +{ > + for (int i = 0; i < N; ++i) > +{ > + y[i * 2] = (indices[i * 2] < N * 2 > + ? x1[indices[i * 2]] + 1 > + : 1); > + y[i * 2 + 1] = (indices[i * 2 + 1] < N * 2 > + ? x2[indices[i * 2 + 1]] + 2 > + : 2); > +} > +} > + > +void > +f2 (int *restrict y, int *restrict x, int *restrict indices) > +{ > + for (int i = 0; i < N; ++i) > +{ > + y[i * 2] = (indices[i * 2] < N * 2 > + ? x[indices[i * 2]] + 1 > + : 1); > + y[i * 2 + 1] = (indices[i * 2 + 1] < N * 2 > + ? x[indices[i * 2 + 1] * 2] + 2 > + : 2); > +} > +} > + > +void > +f3 (int *restrict y, int *restrict x, int *restrict indices) > +{ > + for (int i = 0; i < N; ++i) > +{ > + y[i * 2] = (indices[i * 2] < N * 2 > + ? x[indices[i * 2]] + 1 > + : 1); > + y[i * 2 + 1] = (indices[i * 2 + 1] < N * 2 > + ? x[(unsigned int) indices[i * 2 + 1]] + 2 > + : 2); > +} > +} > + > +/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect { > target vect_gather_load_ifn } } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_8.c > b/gcc/testsuite/gcc.target/aarch64/sve/ma
Re: [PATCH][GCC] arm: add armv9-a architecture to -march
Hi There, I think for AArch32 mapping it back to armv8-a sounds sufficient. Unless we have string or math routines in newlib that make use of any ACLE guards that are beyond armv8-a … Ramana From: Richard Earnshaw Date: Tuesday, 16 November 2021 at 11:48 To: Christophe Lyon , Przemyslaw Wirkus Cc: Ramana Radhakrishnan , gcc-patches@gcc.gnu.org , Richard Earnshaw Subject: Re: [PATCH][GCC] arm: add armv9-a architecture to -march You can't make an omelette without breaking eggs, as they say. New architectures need new assemblers. However, I wonder if there's anything in v9-a that significantly affects the quality of the base multilib code needed for building the libraries. It might be that we can deal with v9-a by just mapping it to the v8-a equivalents. That would then avoid the need for an updated assembler, and reduce the build time and install footprint. R. On 16/11/2021 08:03, Christophe Lyon via Gcc-patches wrote: > Hi, > > > On Tue, Nov 9, 2021 at 12:36 PM Przemyslaw Wirkus via Gcc-patches < > gcc-patches@gcc.gnu.org> wrote: > > -Original Message- > From: Przemyslaw Wirkus > Sent: 18 October 2021 10:37 > To: gcc-patches@gcc.gnu.org > Cc: Richard Earnshaw ; Ramana > Radhakrishnan ; Kyrylo Tkachov > ; ni...@redhat.com > Subject: [PATCH][GCC] arm: add armv9-a architecture to -march > > Hi, > > This patch is adding `armv9-a` to -march in Arm GCC. > > In this patch: >+ Add `armv9-a` to -march. >+ Update multilib with armv9-a and armv9-a+simd. > > After this patch three additional multilib directories are available: > > $ arm-none-eabi-gcc --print-multi-lib .; [...vanilla multi-lib > dirs...] thumb/v9-a/nofp;@mthumb@march=armv9-a@mfloat-abi=soft > thumb/v9-a+simd/softfp;@mthumb@march=armv9-a+simd@mfloat- > abi=softfp > thumb/v9-a+simd/hard;@mthumb@march=armv9-a+simd@mfloat- > abi=hard > >> > > This is causing a GCC build failure when using "old" binutils (I'm using > 2.36.1), > because the new -march=armv9-a option is not supported. This breaks the > multilib support. > > I don't remember how we handled similar cases in the past? Is that just > "expected", and > "current" GCC needs "current" binutils, or should we have a multilib list > dependent on > the actual binutils support? (I think this is not the case, and it sounds > like an undesirable > extra complication in an already overcrowded mutilib-Makefile) > > Christophe > New multi-lib directories under > $GCC_INSTALL_DIE/lib/gcc/arm-none-eabi/12.0.0/thumb are created: > > thumb/ > +--- v9-a > ||--- nofp > | > +--- v9-a+simd > |--- hard > |--- softfp > > Regtested on arm-none-eabi cross and no issues. > > OK for master? >> >> Thanks. >> >> commit 32ba7860ccaddd5219e6dae94a3d0653e124c9dd >> >>> Ok. >>> Thanks, >>> Kyrill >>> >>> > > gcc/ChangeLog: > >* config/arm/arm-cpus.in (armv9): New define. >(ARMv9a): New group. >(armv9-a): New arch definition. >* config/arm/arm-tables.opt: Regenerate. >* config/arm/arm.h (BASE_ARCH_9A): New arch enum value. >* config/arm/t-aprofile: Added armv9-a and armv9+simd. >* config/arm/t-arm-elf: Added arm9-a, v9_fps and all_v9_archs >to MULTILIB_MATCHES. >* config/arm/t-multilib: Added v9_a_nosimd_variants and >v9_a_simd_variants to MULTILIB_MATCHES. >* doc/invoke.texi: Update docs. > > gcc/testsuite/ChangeLog: > >* gcc.target/arm/multilib.exp: Update test with armv9-a entries. >* lib/target-supports.exp (v9a): Add new armflag. >(__ARM_ARCH_9A__): Add new armdef. > > -- > kind regards, > Przemyslaw Wirkus >> >>
Re: Use modref kills in tree-ssa-dse
> > Not sure, tree-ssa-dse.c doesn't seem to handle MEM_REF with offset? > > VN has adjust_offsets_for_equal_base_address for this purpose. I > agree that some common functionality like > > bool > get_relative_extent_of (const ao_ref *base, const ao_ref *ref, > poly_int64 *offset); > > that computes [offset, offset + ref->[max_]size] of REF adjusted as to > make ao_ref_base have the same address (or return false if not > possible). Then [ base->offset, base->offset + base->max_size ] > can be compared against that. OK, I will look into that. > > + if (valid_ao_ref_for_dse (write) > > + && operand_equal_p (write->base, ref->base, OEP_ADDRESS_OF) > > + && known_eq (write->size, write->max_size) > > + && normalize_ref (write, ref) > > normalize_ref alters 'write', I think we should work on a local > copy here. See live_bytes_read which takes a copy of 'use_ref'. We never proces same write twice (get_ao_ref is always constructing fresh copy), so this should be safe. Or shall I turn the write parameter to "ao_ref write" instead of "ao_ref *write" just to be sure we do not break infuture? Thank you, Honza
Re: [PATCH] ivopts: Improve code generated for very simple loops.
On Mon, Nov 15, 2021 at 2:04 PM Roger Sayle wrote: > > > This patch tidies up the code that GCC generates for simple loops, > by selecting/generating a simpler loop bound expression in ivopts. > The original motivation came from looking at the following loop (from > gcc.target/i386/pr90178.c) > > int *find_ptr (int* mem, int sz, int val) > { > for (int i = 0; i < sz; i++) > if (mem[i] == val) > return &mem[i]; > return 0; > } > > which GCC currently compiles to: > > find_ptr: > movq%rdi, %rax > testl %esi, %esi > jle .L4 > leal-1(%rsi), %ecx > leaq4(%rdi,%rcx,4), %rcx > jmp .L3 > .L7:addq$4, %rax > cmpq%rcx, %rax > je .L4 > .L3:cmpl%edx, (%rax) > jne .L7 > ret > .L4:xorl%eax, %eax > ret > > Notice the relatively complex leal/leaq instructions, that result > from ivopts using the following expression for the loop bound: > inv_expr 2: ((unsigned long) ((unsigned int) sz_8(D) + 4294967295) > * 4 + (unsigned long) mem_9(D)) + 4 > > which results from NITERS being (unsigned int) sz_8(D) + 4294967295, > i.e. (sz - 1), and the logic in cand_value_at determining the bound > as BASE + NITERS*STEP at the start of the final iteration and as > BASE + NITERS*STEP + STEP at the end of the final iteration. > > Ideally, we'd like the middle-end optimizers to simplify > BASE + NITERS*STEP + STEP as BASE + (NITERS+1)*STEP, especially > when NITERS already has the form BOUND-1, but with type conversions > and possible overflow to worry about, the above "inv_expr 2" is the > best that can be done by fold (without additional context information). > > This patch improves ivopts' cand_value_at by instead of using just > the tree expression for NITERS, passing the data structure that > explains how that expression was derived. This allows us to peek > under the surface to check that NITERS+1 doesn't overflow, and in > this patch to use the SSA_NAME already holding the required value. > > In the motivating loop above, inv_expr 2 now becomes: > (unsigned long) sz_8(D) * 4 + (unsigned long) mem_9(D) > > And as a result, on x86_64 we now generate: > > find_ptr: > movq%rdi, %rax > testl %esi, %esi > jle .L4 > movslq %esi, %rsi > leaq(%rdi,%rsi,4), %rcx > jmp .L3 > .L7:addq$4, %rax > cmpq%rcx, %rax > je .L4 > .L3:cmpl%edx, (%rax) > jne .L7 > ret > .L4:xorl%eax, %eax > ret > > > This improvement required one minor tweak to GCC's testsuite for > gcc.dg/wrapped-binop-simplify.c, where we again generate better > code, and therefore no longer find as many optimization opportunities > in later passes (vrp2). > > Previously: > > void v1 (unsigned long *in, unsigned long *out, unsigned int n) > { > int i; > for (i = 0; i < n; i++) { > out[i] = in[i]; > } > } > > on x86_64 generated: > v1: testl %edx, %edx > je .L1 > movl%edx, %edx > xorl%eax, %eax > .L3:movq(%rdi,%rax,8), %rcx > movq%rcx, (%rsi,%rax,8) > addq$1, %rax > cmpq%rax, %rdx > jne .L3 > .L1:ret > > and now instead generates: > v1: testl %edx, %edx > je .L1 > movl%edx, %edx > xorl%eax, %eax > leaq0(,%rdx,8), %rcx > .L3:movq(%rdi,%rax), %rdx > movq%rdx, (%rsi,%rax) > addq$8, %rax > cmpq%rax, %rcx > jne .L3 > .L1:ret Is that actually better? IIRC the addressing modes are both complex and we now have an extra lea? For this case I see we generate _15 = n_10(D) + 4294967295; _8 = (unsigned long) _15; _7 = _8 + 1; where n is unsigned int so if we know that n is not zero we can simplify the addition and conveniently the loop header test provides this guarantee. IIRC there were some attempts to enhance match.pd for some cases of such expressions. > > This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap > and make -k check with no new failures. Ok for mainline? + /* If AFTER_ADJUST is required, the code below generates the equivalent + * of BASE + NITER * STEP + STEP, when ideally we'd prefer the expression + * BASE + (NITER + 1) * STEP, especially when NITER is often of the form + * SSA_NAME - 1. Unfortunately, guaranteeing that adding 1 to NITER + * doesn't overflow is tricky, so we peek inside the TREE_NITER_DESC + * class for common idioms that we know are safe. */ No '* ' each line. + if (after_adjust + && desc->control.no_overflow + && integer_onep (desc->control.step) + && integer_onep (desc->control.base) + && desc->cmp == LT_EXPR + && TREE_CODE (desc->bound) == SSA_NAME) +{ + niter = desc->bound; + after_adjust = false; +} I wonder if the non-overflo
Re: Use modref kills in tree-ssa-dse
On Tue, 16 Nov 2021, Jan Hubicka wrote: > > > > Not sure, tree-ssa-dse.c doesn't seem to handle MEM_REF with offset? > > > > VN has adjust_offsets_for_equal_base_address for this purpose. I > > agree that some common functionality like > > > > bool > > get_relative_extent_of (const ao_ref *base, const ao_ref *ref, > > poly_int64 *offset); > > > > that computes [offset, offset + ref->[max_]size] of REF adjusted as to > > make ao_ref_base have the same address (or return false if not > > possible). Then [ base->offset, base->offset + base->max_size ] > > can be compared against that. > > OK, I will look into that. > > > + if (valid_ao_ref_for_dse (write) > > > + && operand_equal_p (write->base, ref->base, OEP_ADDRESS_OF) > > > + && known_eq (write->size, write->max_size) > > > + && normalize_ref (write, ref) > > > > normalize_ref alters 'write', I think we should work on a local > > copy here. See live_bytes_read which takes a copy of 'use_ref'. > > We never proces same write twice (get_ao_ref is always constructing > fresh copy), so this should be safe. Or shall I turn the write > parameter to "ao_ref write" instead of "ao_ref *write" just to be sure > we do not break infuture? Yes. Thanks, Richard.
Re: [PATCH 3/3] elf: Add _dl_find_eh_frame function
On 03/11/2021 13:28, Florian Weimer via Gcc-patches wrote: > This function is similar to __gnu_Unwind_Find_exidx as used on arm. > It can be used to speed up the libgcc unwinder. Besides the terse patch description, the design seems ok to accomplish the lock-free read and update. There are some question and remarks below, and I still need to revise the tests. However the code is somewhat complex and I would like to have some feedback if gcc will be willing to accept this change (I assume it would require this code merge on glibc beforehand). > --- > NEWS | 4 + > bits/dlfcn_eh_frame.h | 33 + > dlfcn/Makefile| 2 +- > dlfcn/dlfcn.h | 2 + > elf/Makefile | 31 +- > elf/Versions | 3 + > elf/dl-close.c| 4 + > elf/dl-find_eh_frame.c| 864 ++ > elf/dl-find_eh_frame.h| 90 ++ > elf/dl-find_eh_frame_slow.h | 55 ++ > elf/dl-libc_freeres.c | 2 + > elf/dl-open.c | 5 + > elf/rtld.c| 7 + > elf/tst-dl_find_eh_frame-mod1.c | 10 + > elf/tst-dl_find_eh_frame-mod2.c | 10 + > elf/tst-dl_find_eh_frame-mod3.c | 10 + > elf/tst-dl_find_eh_frame-mod4.c | 10 + > elf/tst-dl_find_eh_frame-mod5.c | 11 + > elf/tst-dl_find_eh_frame-mod6.c | 11 + > elf/tst-dl_find_eh_frame-mod7.c | 10 + > elf/tst-dl_find_eh_frame-mod8.c | 10 + > elf/tst-dl_find_eh_frame-mod9.c | 10 + > elf/tst-dl_find_eh_frame-threads.c| 237 + > elf/tst-dl_find_eh_frame.c| 179 > include/atomic_wide_counter.h | 14 + > include/bits/dlfcn_eh_frame.h | 1 + > include/link.h| 3 + > manual/Makefile | 2 +- > manual/dynlink.texi | 69 ++ > manual/libdl.texi | 10 - > manual/probes.texi| 2 +- > manual/threads.texi | 2 +- > sysdeps/i386/bits/dlfcn_eh_frame.h| 34 + > sysdeps/mach/hurd/i386/ld.abilist | 1 + > sysdeps/nios2/bits/dlfcn_eh_frame.h | 34 + > sysdeps/unix/sysv/linux/aarch64/ld.abilist| 1 + > sysdeps/unix/sysv/linux/alpha/ld.abilist | 1 + > sysdeps/unix/sysv/linux/arc/ld.abilist| 1 + > sysdeps/unix/sysv/linux/arm/be/ld.abilist | 1 + > sysdeps/unix/sysv/linux/arm/le/ld.abilist | 1 + > sysdeps/unix/sysv/linux/csky/ld.abilist | 1 + > sysdeps/unix/sysv/linux/hppa/ld.abilist | 1 + > sysdeps/unix/sysv/linux/i386/ld.abilist | 1 + > sysdeps/unix/sysv/linux/ia64/ld.abilist | 1 + > .../unix/sysv/linux/m68k/coldfire/ld.abilist | 1 + > .../unix/sysv/linux/m68k/m680x0/ld.abilist| 1 + > sysdeps/unix/sysv/linux/microblaze/ld.abilist | 1 + > .../unix/sysv/linux/mips/mips32/ld.abilist| 1 + > .../sysv/linux/mips/mips64/n32/ld.abilist | 1 + > .../sysv/linux/mips/mips64/n64/ld.abilist | 1 + > sysdeps/unix/sysv/linux/nios2/ld.abilist | 1 + > .../sysv/linux/powerpc/powerpc32/ld.abilist | 1 + > .../linux/powerpc/powerpc64/be/ld.abilist | 1 + > .../linux/powerpc/powerpc64/le/ld.abilist | 1 + > sysdeps/unix/sysv/linux/riscv/rv32/ld.abilist | 1 + > sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist | 1 + > .../unix/sysv/linux/s390/s390-32/ld.abilist | 1 + > .../unix/sysv/linux/s390/s390-64/ld.abilist | 1 + > sysdeps/unix/sysv/linux/sh/be/ld.abilist | 1 + > sysdeps/unix/sysv/linux/sh/le/ld.abilist | 1 + > .../unix/sysv/linux/sparc/sparc32/ld.abilist | 1 + > .../unix/sysv/linux/sparc/sparc64/ld.abilist | 1 + > sysdeps/unix/sysv/linux/x86_64/64/ld.abilist | 1 + > sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist | 1 + > 64 files changed, 1795 insertions(+), 16 deletions(-) > create mode 100644 bits/dlfcn_eh_frame.h > create mode 100644 elf/dl-find_eh_frame.c > create mode 100644 elf/dl-find_eh_frame.h > create mode 100644 elf/dl-find_eh_frame_slow.h > create mode 100644 elf/tst-dl_find_eh_frame-mod1.c > create mode 100644 elf/tst-dl_find_eh_frame-mod2.c > create mode 100644 elf/tst-dl_find_eh_frame-mod3.c > create mode 100644 elf/tst-dl_find_eh_frame-mod4.c > create mode 100644 elf/tst-dl_find_eh_frame-mod5.c > create mode 100644 elf/tst-dl_find_eh_frame-mod6.c > create mode 100644 elf/tst-dl_find_eh_frame-mod7.c > create mode 100644 elf/tst-dl_find_eh_frame-mod8.c > create mode 100644 elf/tst-dl_find_eh_frame-mod9.c > create mode 100644 elf/tst-d
[PATCH, v5, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120 v5]
Hi Jakub, On 2021/6/24 9:15 PM, Jakub Jelinek wrote: On Fri, Jun 18, 2021 at 10:25:16PM +0800, Chung-Lin Tang wrote: Note, you'll need to rebase your patch, it clashes with r12-1768-g7619d33471c10fe3d149dcbb701d99ed3dd23528. Sorry for that. And sorry for patch review delay. --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -13104,6 +13104,12 @@ handle_omp_array_sections_1 (tree c, tree t, vec &types, return error_mark_node; } t = TREE_OPERAND (t, 0); + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) Map clauses never appear on declare simd, so (ort == C_ORT_ACC || ort == C_ORT_OMP) previously meant always and since the in_reduction change is incorrect (as C_ORT_OMP_TARGET is used for target construct but not for e.g. target data* or target update). + && TREE_CODE (t) == MEM_REF) Upon reviewing, it appears that most of these C_ORT_* tests are no longer needed, removed in new patch. So please just use if (TREE_CODE (t) == MEM_REF) or explain when it shouldn't trigger. @@ -14736,6 +14743,11 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) { while (TREE_CODE (t) == COMPONENT_REF) t = TREE_OPERAND (t, 0); + if (TREE_CODE (t) == MEM_REF) + { + t = TREE_OPERAND (t, 0); + STRIP_NOPS (t); + } This doesn't look correct. At least the parsing (and the spec AFAIK) doesn't ensure that if there is ->, it must come before all the dots. So, if one uses map (s->x.y) the above would work, but if map (s->x.y->z) or map (s.a->b->c->d->e) is used, it wouldn't. I'd expect a single while loop that looks through COMPONENT_REFs and MEM_REFs as they appear. Maybe the handle_omp_array_sections_1 MEM_REF case too? Or do you want to have it done incrementally, start with supporting only a single -> first before all the dots and later on add support for the rest? I think the 5.0 and especially 5.1 wording basically says that map clause operand is arbitrary lvalue expression that includes array section support too, so eventually we should just have somewhere in parsing scope a bool whether OpenMP array sections are allowed or not, add OMP_ARRAY_REF or similar tree code for those and after parsing the expression, ensure array sections appear only where they can appear and for a subset of the lvalue expressions where we have decl plus series of -> field or . field or [ index ] or [ array section stuff ] handle those specially. That arbitrary lvalue can certainly be done incrementally. map (foo(123)->a.b[3]->c.d[:7]) and the like. Indeed this kind of modification is sort of "as encountered", so there are probably many cases that are not completely handled yet; it's not just the front-end, but also changes in gimplify_scan_omp_clauses(). However, I had another patch that should've plowed a bit further on this: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html as well as those patch sets that Julian is working on. (our current plan is to have my sets go in first, and Julian's on top, to minimize clashing) if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP && OMP_CLAUSE_MAP_IMPLICIT (c) && (bitmap_bit_p (&map_head, DECL_UID (t)) @@ -14802,6 +14814,15 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) bias) to zero here, so it is not set erroneously to the pointer size later on in gimplify.c. */ OMP_CLAUSE_SIZE (c) = size_zero_node; + indir_component_ref_p = false; + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) Same comment about ort tests. + && TREE_CODE (t) == COMPONENT_REF + && TREE_CODE (TREE_OPERAND (t, 0)) == MEM_REF) + { + t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); + indir_component_ref_p = true; + STRIP_NOPS (t); + } Again, this can handle only a single -> @@ -42330,16 +42328,10 @@ cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok, cclauses[C_OMP_CLAUSE_SPLIT_TARGET] = tc; } } - tree stmt = make_node (OMP_TARGET); - TREE_TYPE (stmt) = void_type_node; - OMP_TARGET_CLAUSES (stmt) = cclauses[C_OMP_CLAUSE_SPLIT_TARGET]; - c_omp_adjust_map_clauses (OMP_TARGET_CLAUSES (stmt), true); - OMP_TARGET_BODY (stmt) = body; - OMP_TARGET_COMBINED (stmt) = 1; - SET_EXPR_LOCATION (stmt, pragma_tok->location); - add_stmt (stmt); - pc = &OMP_TARGET_CLAUSES (stmt); - goto check_clauses; + c_omp_adjust_map_clauses (cclauses[C_OMP_CLAUSE_SPLIT_TARGET], true); + finish_omp_target (pragma_tok->location, +cclauses[C_OMP_CLAUSE_SPLIT_TARGET], body, true
Re: [PATCH, rs6000] Optimization for vec_xl_sext
Hi Hao Chen, I don't understand. This patch was already approved and you committed it. :-) I know because I needed to make corresponding adjustments to the new builtins code. Thanks, Bill On 11/15/21 8:16 PM, HAO CHEN GUI wrote: > Hi, > > The patch optimizes the code generation for vec_xl_sext builtin. Now all > the sign extensions are done on VSX registers directly. > > Bootstrapped and tested on powerpc64le-linux with no regressions. Is this > okay for trunk? Any recommendations? Thanks a lot. > > ChangeLog > > 2021-11-16 Haochen Gui > > gcc/ > * config/rs6000/rs6000-call.c (altivec_expand_lxvr_builtin): Modify > the expansion for sign extension. All extensions are done on VSX > registers. > > gcc/testsuite/ > * gcc.target/powerpc/p10_vec_xl_sext.c: New test. > > patch.diff > > diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c > index b4e13af4dc6..587e9fa2a2a 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -9779,7 +9779,7 @@ altivec_expand_lxvr_builtin (enum insn_code icode, tree > exp, rtx target, bool bl > > if (sign_extend) > { > - rtx discratch = gen_reg_rtx (DImode); > + rtx discratch = gen_reg_rtx (V2DImode); > rtx tiscratch = gen_reg_rtx (TImode); > > /* Emit the lxvr*x insn. */ > @@ -9788,20 +9788,31 @@ altivec_expand_lxvr_builtin (enum insn_code icode, > tree exp, rtx target, bool bl > return 0; > emit_insn (pat); > > - /* Emit a sign extension from QI,HI,WI to double (DI). */ > - rtx scratch = gen_lowpart (smode, tiscratch); > + /* Emit a sign extension from V16QI,V8HI,V4SI to V2DI. */ > + rtx temp1, temp2; > if (icode == CODE_FOR_vsx_lxvrbx) > - emit_insn (gen_extendqidi2 (discratch, scratch)); > + { > + temp1 = simplify_gen_subreg (V16QImode, tiscratch, TImode, 0); > + emit_insn (gen_vsx_sign_extend_qi_v2di (discratch, temp1)); > + } > else if (icode == CODE_FOR_vsx_lxvrhx) > - emit_insn (gen_extendhidi2 (discratch, scratch)); > + { > + temp1 = simplify_gen_subreg (V8HImode, tiscratch, TImode, 0); > + emit_insn (gen_vsx_sign_extend_hi_v2di (discratch, temp1)); > + } > else if (icode == CODE_FOR_vsx_lxvrwx) > - emit_insn (gen_extendsidi2 (discratch, scratch)); > - /* Assign discratch directly if scratch is already DI. */ > - if (icode == CODE_FOR_vsx_lxvrdx) > - discratch = scratch; > + { > + temp1 = simplify_gen_subreg (V4SImode, tiscratch, TImode, 0); > + emit_insn (gen_vsx_sign_extend_si_v2di (discratch, temp1)); > + } > + else if (icode == CODE_FOR_vsx_lxvrdx) > + discratch = simplify_gen_subreg (V2DImode, tiscratch, TImode, 0); > + else > + gcc_unreachable (); > > - /* Emit the sign extension from DI (double) to TI (quad). */ > - emit_insn (gen_extendditi2 (target, discratch)); > + /* Emit the sign extension from V2DI (double) to TI (quad). */ > + temp2 = simplify_gen_subreg (TImode, discratch, V2DImode, 0); > + emit_insn (gen_extendditi2_vector (target, temp2)); > > return target; > } > diff --git a/gcc/testsuite/gcc.target/powerpc/p10_vec_xl_sext.c > b/gcc/testsuite/gcc.target/powerpc/p10_vec_xl_sext.c > new file mode 100644 > index 000..78e72ac5425 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/p10_vec_xl_sext.c > @@ -0,0 +1,35 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target int128 } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ > + > +#include > + > +vector signed __int128 > +foo1 (signed long a, signed char *b) > +{ > + return vec_xl_sext (a, b); > +} > + > +vector signed __int128 > +foo2 (signed long a, signed short *b) > +{ > + return vec_xl_sext (a, b); > +} > + > +vector signed __int128 > +foo3 (signed long a, signed int *b) > +{ > + return vec_xl_sext (a, b); > +} > + > +vector signed __int128 > +foo4 (signed long a, signed long *b) > +{ > + return vec_xl_sext (a, b); > +} > + > +/* { dg-final { scan-assembler-times {\mvextsd2q\M} 4 } } */ > +/* { dg-final { scan-assembler-times {\mvextsb2d\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvextsh2d\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mvextsw2d\M} 1 } } */ >
Re: [PATCH v4] Fix ICE when mixing VLAs and statement expressions [PR91038]
Am Montag, den 08.11.2021, 19:13 +0100 schrieb Martin Uecker: > Am Montag, den 08.11.2021, 12:13 -0500 schrieb Jason Merrill: > > On 11/7/21 01:40, Uecker, Martin wrote: > > > Am Mittwoch, den 03.11.2021, 10:18 -0400 schrieb Jason Merrill: > > ... > > > > Thank you! I made these changes and ran > > > bootstrap and tests again. > > > > Hmm, it doesn't look like you made the change to use the save_expr > > function instead of build1? > > Oh, sorry. I wanted to change it and then forgot. > Now also with this change (changelog as before). Ok, with is this change? Best, Martin > > > Ok for trunk? > > > > > > > > > Any idea how to fix returning structs with > > > VLA member from statement expressions? > > > > Testcase? > > void foo(void) > { > ({ int N = 3; struct { char x[N]; } x; x; }); > } > > The difference to the tests in this patch (which > also forgot to include in the last version) is that > the object of variable size is returned from the > statement expression and not a pointer to it. > This can not happen with arrays because they decay > to pointers. > > > Martin > > > > > Otherwise, I will add an error message to > > > the FE in another patch. > > > > > > Martin > > > > > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c > index 436df45df68..95083f95442 100644 > --- a/gcc/c-family/c-common.c > +++ b/gcc/c-family/c-common.c > @@ -3306,7 +3306,19 @@ pointer_int_sum (location_t loc, enum tree_code > resultcode, >TREE_TYPE (result_type))) > size_exp = integer_one_node; >else > -size_exp = size_in_bytes_loc (loc, TREE_TYPE (result_type)); > +{ > + size_exp = size_in_bytes_loc (loc, TREE_TYPE (result_type)); > + /* Wrap the pointer expression in a SAVE_EXPR to make sure it > + is evaluated first when the size expression may depend > + on it for VM types. */ > + if (TREE_SIDE_EFFECTS (size_exp) > + && TREE_SIDE_EFFECTS (ptrop) > + && variably_modified_type_p (TREE_TYPE (ptrop), NULL)) > + { > + ptrop = save_expr (ptrop); > + size_exp = build2 (COMPOUND_EXPR, TREE_TYPE (intop), ptrop, size_exp); > + } > +} > >/* We are manipulating pointer values, so we don't need to warn > about relying on undefined signed overflow. We disable the > diff --git a/gcc/gimplify.c b/gcc/gimplify.c > index c2ab96e7e18..84f7dc3c248 100644 > --- a/gcc/gimplify.c > +++ b/gcc/gimplify.c > @@ -2964,7 +2964,9 @@ gimplify_var_or_parm_decl (tree *expr_p) > declaration, for which we've already issued an error. It would > be really nice if the front end wouldn't leak these at all. > Currently the only known culprit is C++ destructors, as seen > - in g++.old-deja/g++.jason/binding.C. */ > + in g++.old-deja/g++.jason/binding.C. > + Another possible culpit are size expressions for variably modified > + types which are lost in the FE or not gimplified correctly. */ >if (VAR_P (decl) >&& !DECL_SEEN_IN_BIND_EXPR_P (decl) >&& !TREE_STATIC (decl) && !DECL_EXTERNAL (decl) > @@ -3109,16 +3111,22 @@ gimplify_compound_lval (tree *expr_p, gimple_seq > *pre_p, gimple_seq > *post_p, > expression until we deal with any variable bounds, sizes, or > positions in order to deal with PLACEHOLDER_EXPRs. > > - So we do this in three steps. First we deal with the annotations > - for any variables in the components, then we gimplify the base, > - then we gimplify any indices, from left to right. */ > + The base expression may contain a statement expression that > + has declarations used in size expressions, so has to be > + gimplified before gimplifying the size expressions. > + > + So we do this in three steps. First we deal with variable > + bounds, sizes, and positions, then we gimplify the base, > + then we deal with the annotations for any variables in the > + components and any indices, from left to right. */ > + >for (i = expr_stack.length () - 1; i >= 0; i--) > { >tree t = expr_stack[i]; > >if (TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF) > { > - /* Gimplify the low bound and element type size and put them into > + /* Deal with the low bound and element type size and put them into >the ARRAY_REF. If these values are set, they have already been >gimplified. */ > if (TREE_OPERAND (t, 2) == NULL_TREE) > @@ -3127,18 +3135,8 @@ gimplify_compound_lval (tree *expr_p, gimple_seq > *pre_p, gimple_seq > *post_p, > if (!is_gimple_min_invariant (low)) > { > TREE_OPERAND (t, 2) = low; > - tret = gimplify_expr (&TREE_OPERAND (t, 2), pre_p, > - post_p, is_gimple_reg, > - fb_rvalue); > - ret = MIN (ret, tret); > } > } > - else
Re: [PATCH] Loop unswitching: support gswitch statements.
On 11/11/21 08:15, Richard Biener wrote: If you look at simplify_using_entry_checks then this is really really simple, so I'd try to abstract this, recording sth like a unswitch_predicate where we store the condition we unswitch on plus maybe cache the constant range of a VAR cmp CST variable condition on the true/false edge. We can then try to simplify each gcond/gswitch based on such an unswitch_predicate (when we ever scan the loop once to discover all opportunities we'd have a set of unswitch_predicates to try simplifying against). As said the integer range thing would be an improvement over the current state so even that can be done as followup but I guess for gswitch support that's going to be the thing to use. I started working on the unswitch_predicate where I recond also true/false-edge irange of an expression we unswitch on. I noticed one significant problem, let's consider: for (int i = 0; i < size; i++) { double tmp; if (order == 1) tmp = -8 * a[i]; else { if (order == 2) tmp = -4 * b[i]; else tmp = a[i]; } r[i] = 3.4f * tmp + d[i]; } We can end up with first unswitching candidate being 'if (order == 2)' (I have a real benchmark where it happens). So I collect ranges and they are [2,2] for true edge and [-INF, 0], [3, INF] (because we came to the condition through order != 1 cond). Then the loop is cloned and we have if (order == 2) loop_version_1 else loop_version_2 but in loop_version_2 we wrongly fold 'if (order == 1)' to false because it's reflected in the range. So the question is, can one iterate get_loop_body stmts in some dominator order? Thanks, Martin
[committed] libstdc++: Fix typos in tests
Tested x86_64-linux, pushed to trunk. libstdc++-v3/ChangeLog: * testsuite/21_strings/basic_string/allocator/71964.cc: Fix typo. * testsuite/23_containers/set/allocator/71964.cc: Likewise. --- .../testsuite/21_strings/basic_string/allocator/71964.cc| 2 +- libstdc++-v3/testsuite/23_containers/set/allocator/71964.cc | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/71964.cc b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/71964.cc index c57cb96e971..4196b331aca 100644 --- a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/71964.cc +++ b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/71964.cc @@ -40,7 +40,7 @@ template a.moved_from = true; } -T* allocate(unsigned n) { return std::allocator{}.allcoate(n); } +T* allocate(unsigned n) { return std::allocator{}.allocate(n); } void deallocate(T* p, unsigned n) { std::allocator{}.deallocate(p, n); } bool moved_to; diff --git a/libstdc++-v3/testsuite/23_containers/set/allocator/71964.cc b/libstdc++-v3/testsuite/23_containers/set/allocator/71964.cc index 34a02d85e66..a2c166afd0f 100644 --- a/libstdc++-v3/testsuite/23_containers/set/allocator/71964.cc +++ b/libstdc++-v3/testsuite/23_containers/set/allocator/71964.cc @@ -40,7 +40,7 @@ template a.moved_from = true; } -T* allocate(unsigned n) { return std::allocator{}.allcoate(n); } +T* allocate(unsigned n) { return std::allocator{}.allocate(n); } void deallocate(T* p, unsigned n) { std::allocator{}.deallocate(p, n); } bool moved_to; -- 2.31.1
[committed] libstdc++: Fix out-of-bound array accesses in testsuite
Tested x86_64-linux, pushed to trunk. I fixed some undefined behaviour in string tests in r238609, but I only fixed the narrow char versions. This applies the same fixes to the wchar_t ones. These problems were found when testing a patch to make std::basic_string usable in constexpr. libstdc++-v3/ChangeLog: * testsuite/21_strings/basic_string/modifiers/append/wchar_t/1.cc: Fix reads past the end of strings. * testsuite/21_strings/basic_string/operations/compare/wchar_t/1.cc: Likewise. * testsuite/experimental/string_view/operations/compare/wchar_t/1.cc: Likewise. --- .../21_strings/basic_string/modifiers/append/wchar_t/1.cc | 2 +- .../21_strings/basic_string/operations/compare/wchar_t/1.cc | 4 ++-- .../experimental/string_view/operations/compare/wchar_t/1.cc | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/modifiers/append/wchar_t/1.cc b/libstdc++-v3/testsuite/21_strings/basic_string/modifiers/append/wchar_t/1.cc index bb2d682de8e..684209f143e 100644 --- a/libstdc++-v3/testsuite/21_strings/basic_string/modifiers/append/wchar_t/1.cc +++ b/libstdc++-v3/testsuite/21_strings/basic_string/modifiers/append/wchar_t/1.cc @@ -117,7 +117,7 @@ void test01(void) VERIFY( str06 == L"corpus, corpus" ); str06 = str02; - str06.append(L"corpus, ", 12); + str06.append(L"corpus, ", 9); // n=9 includes null terminator VERIFY( str06 != L"corpus, corpus, " ); diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/operations/compare/wchar_t/1.cc b/libstdc++-v3/testsuite/21_strings/basic_string/operations/compare/wchar_t/1.cc index 27836f8e6fb..6f2113fb16a 100644 --- a/libstdc++-v3/testsuite/21_strings/basic_string/operations/compare/wchar_t/1.cc +++ b/libstdc++-v3/testsuite/21_strings/basic_string/operations/compare/wchar_t/1.cc @@ -81,8 +81,8 @@ test01() test_value(wcsncmp(str_1.data(), str_0.data(), 6), z); test_value(wcsncmp(str_1.data(), str_0.data(), 14), lt); test_value(wmemcmp(str_1.data(), str_0.data(), 6), z); - test_value(wmemcmp(str_1.data(), str_0.data(), 14), lt); - test_value(wmemcmp(L"costa marbella", L"costa rica", 14), lt); + test_value(wmemcmp(str_1.data(), str_0.data(), 10), lt); + test_value(wmemcmp(L"costa marbella", L"costa rica", 10), lt); // int compare(const basic_string& str) const; test_value(str_0.compare(str_1), gt); //because r>m diff --git a/libstdc++-v3/testsuite/experimental/string_view/operations/compare/wchar_t/1.cc b/libstdc++-v3/testsuite/experimental/string_view/operations/compare/wchar_t/1.cc index db523e6a83c..20bb030970b 100644 --- a/libstdc++-v3/testsuite/experimental/string_view/operations/compare/wchar_t/1.cc +++ b/libstdc++-v3/testsuite/experimental/string_view/operations/compare/wchar_t/1.cc @@ -81,8 +81,8 @@ test01() test_value(wcsncmp(str_1.data(), str_0.data(), 6), z); test_value(wcsncmp(str_1.data(), str_0.data(), 14), lt); test_value(wmemcmp(str_1.data(), str_0.data(), 6), z); - test_value(wmemcmp(str_1.data(), str_0.data(), 14), lt); - test_value(wmemcmp(L"costa marbella", L"costa rica", 14), lt); + test_value(wmemcmp(str_1.data(), str_0.data(), 10), lt); + test_value(wmemcmp(L"costa marbella", L"costa rica", 10), lt); // int compare(const basic_string_view& str) const; test_value(str_0.compare(str_1), gt); //because r>m -- 2.31.1
Re: [PATCH v2] configure: define TARGET_LIBC_GNUSTACK on musl
Hi, Looks fine to me. If possible, maybe it should even be back-ported to stable branches. Not sure if MIPS assembly sources (if any) in musl would need explicit .note.GNU-stack to complement this? Best regards, Dragan On 16-Nov-21 06:13, Ilya Lipnitskiy wrote: musl only uses PT_GNU_STACK to set default thread stack size and has no executable stack support[0], so there is no reason not to emit the .note.GNU-stack section on musl builds. [0]: https://lore.kernel.org/all/20190423192534.gn23...@brightrain.aerifal.cx/T/#u gcc/ChangeLog: * configure: Regenerate. * configure.ac: define TARGET_LIBC_GNUSTACK on musl Signed-off-by: Ilya Lipnitskiy --- gcc/configure| 3 +++ gcc/configure.ac | 3 +++ 2 files changed, 6 insertions(+) diff --git a/gcc/configure b/gcc/configure index 74b9d9be4c85..7091a838aefa 100755 --- a/gcc/configure +++ b/gcc/configure @@ -31275,6 +31275,9 @@ fi # Check if the target LIBC handles PT_GNU_STACK. gcc_cv_libc_gnustack=unknown case "$target" in + mips*-*-linux-musl*) +gcc_cv_libc_gnustack=yes +;; mips*-*-linux*) if test $glibc_version_major -gt 2 \ diff --git a/gcc/configure.ac b/gcc/configure.ac index c9ee1fb8919e..8a2d34179a75 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -6961,6 +6961,9 @@ fi # Check if the target LIBC handles PT_GNU_STACK. gcc_cv_libc_gnustack=unknown case "$target" in + mips*-*-linux-musl*) +gcc_cv_libc_gnustack=yes +;; mips*-*-linux*) GCC_GLIBC_VERSION_GTE_IFELSE([2], [31], [gcc_cv_libc_gnustack=yes], ) ;;
Re: [PATCH] Fix spelling of ones' complement.
> On Nov 16, 2021, at 2:03 AM, Aldy Hernandez via Gcc-patches > wrote: > > On Tue, Nov 16, 2021, 03:20 Marek Polacek via Gcc-patches < > gcc-patches@gcc.gnu.org> wrote: > >> On Tue, Nov 16, 2021 at 02:01:47AM +, Koning, Paul via Gcc-patches >> wrote: >>> >>> On Nov 15, 2021, at 8:48 PM, Marek Polacek via Gcc-patches < >> gcc-patches@gcc.gnu.org> wrote: Nitpicking time. It's spelled "ones' complement" rather than "one's complement". >>> >>> Is that so? I see Wikipedia claims it is, but there are no sources for >> that claim. (There is an assertion that it is "discussed at length on the >> talk page" of an article about number representation, but in fact there is >> no discussion there at all.) >>> >>> I have never seen this spelling before, and I very much doubt its >> validity. For one thing, why then have "two's complement"? For another, >> to pick one random authority, J.E. Thornton in "Design of a computer -- the >> Control Data 6600" refers to "one's complement" to describe the well known >> mode used by that machine and its relatives. >> >> Knuth, The Art of Computer Programming Volume 2, page 203-4: >> >> "A two's complement number is complemented with respect to a single >> power of 2, while a ones' complement number is complemented with respect >> to a long sequence of 1s." >> > > I think you get to do a drop mike when you pull out Knuth. > > :-) If that were the only source, sure. But with authoritative sources for both terms (with the ones I quoted being the earlier ones) at the very least there is an argument that both terms are used. Some more: DEC PDP-1 handbook (April 1960), page 9: "Negative numbers are represented as the 1's complement of the positive numbers." Univac 1107 CPU manual, page 2-6: "Next, the adder subtracts the one's complement..." CDC 160 programming manual (1963), page 2-1: "All arithmetic is binary, one's complement notation". Incidentally, these are the four of the five machines cited by the Wikipedia article.
Re: [PATCH] Loop unswitching: support gswitch statements.
On 11/11/21 08:15, Richard Biener wrote: So I'd try to do no functional change first, improving the costing and setting up the transform to simply pick up the stmts to "fold" as discovered during analysis (as I hinted you possibly can use gimple_uid to mark the stmts that simplify, IIRC gimple_uid is preserved during copying. gimple_uid would also scale better than gimple_plf in case we do the analysis for all candidates at once). Thinking about the analysis. Am I correct that we want to properly calculate loop size for true and false edge of a potential gcond before the actually unswitching? We can do that by finding a first gcond candidate, evaluate (symbolic + irange approache) all other gcond in the loop body and use BB_REACHABLE discovery. Similarly to what we do now at lines 378-446. Then tree_num_loop_insns can be adjusted for only these reachable blocks. Having that, we can calculate # of insns that will live in true/false loops. Then we can call tree_unswitch_loop and make the gcond folding as we do in the versioned loops. Is it a step in good direction? Having that we can then extend it to gswitch statements. Cheers, Martin
Re: [PATCH] Fix spelling of ones' complement.
On Tue, Nov 16, 2021 at 3:40 PM Koning, Paul wrote: > > > > > On Nov 16, 2021, at 2:03 AM, Aldy Hernandez via Gcc-patches > > wrote: > > > > On Tue, Nov 16, 2021, 03:20 Marek Polacek via Gcc-patches < > > gcc-patches@gcc.gnu.org> wrote: > > > >> On Tue, Nov 16, 2021 at 02:01:47AM +, Koning, Paul via Gcc-patches > >> wrote: > >>> > >>> > On Nov 15, 2021, at 8:48 PM, Marek Polacek via Gcc-patches < > >> gcc-patches@gcc.gnu.org> wrote: > > Nitpicking time. It's spelled "ones' complement" rather than "one's > complement". > >>> > >>> Is that so? I see Wikipedia claims it is, but there are no sources for > >> that claim. (There is an assertion that it is "discussed at length on the > >> talk page" of an article about number representation, but in fact there is > >> no discussion there at all.) > >>> > >>> I have never seen this spelling before, and I very much doubt its > >> validity. For one thing, why then have "two's complement"? For another, > >> to pick one random authority, J.E. Thornton in "Design of a computer -- the > >> Control Data 6600" refers to "one's complement" to describe the well known > >> mode used by that machine and its relatives. > >> > >> Knuth, The Art of Computer Programming Volume 2, page 203-4: > >> > >> "A two's complement number is complemented with respect to a single > >> power of 2, while a ones' complement number is complemented with respect > >> to a long sequence of 1s." > >> > > > > I think you get to do a drop mike when you pull out Knuth. > > > > :-) > > If that were the only source, sure. But with authoritative sources for both > terms (with the ones I quoted being the earlier ones) at the very least there > is an argument that both terms are used. > > Some more: DEC PDP-1 handbook (April 1960), page 9: "Negative numbers are > represented as the 1's complement of the positive numbers." > > Univac 1107 CPU manual, page 2-6: "Next, the adder subtracts the one's > complement..." > > CDC 160 programming manual (1963), page 2-1: "All arithmetic is binary, one's > complement notation". > > Incidentally, these are the four of the five machines cited by the Wikipedia > article. All sources before Knuth are clearly wrong. How could they not? Folks living in the pre-Knuth era lived without a deity. :-P
[PATCH v2] c++: improve print_node of PTRMEM_CST
On 11/4/21 16:32, Jakub Jelinek wrote: On Thu, Nov 04, 2021 at 11:52:34AM -0400, Jason Merrill via Gcc-patches wrote: It's been inconvenient that pretty-printing of PTRMEM_CST didn't display what member the constant refers to. Adding that is complicated by the absence of a langhook for CONSTANT_CLASS_P nodes; the simplest fix for that is to use the tcc_exceptional hook for tcc_constant as well. Tested x86_64-pc-linux-gnu. OK for trunk, or should I add a new hook for constants? gcc/cp/ChangeLog: * ptree.c (cxx_print_xnode): Handle PTRMEM_CST. gcc/ChangeLog: * print-tree.c (print_node): Also call print_xnode hook for tcc_constant class. I think using the same langhook is fine, but in that case certainly /* Called by print_tree when there is a tree of class tcc_exceptional that it doesn't know how to display. */ should be adjusted so that it mentions also tcc_constant. Done. And maybe rename it from print_xnode to print_node? I think changing the comment is enough, it's still just exceptional and constant. This is what I'm pushing:From 761b128dbfa2fbc1f1a0138160a39db95db7759a Mon Sep 17 00:00:00 2001 From: Jason Merrill Date: Fri, 29 Oct 2021 16:39:01 -0400 Subject: [PATCH] c++: improve print_node of PTRMEM_CST To: gcc-patches@gcc.gnu.org It's been inconvenient that pretty-printing of PTRMEM_CST didn't display what member the constant refers to. Adding that is complicated by the absence of a langhook for CONSTANT_CLASS_P nodes; the simplest fix for that is to use the tcc_exceptional hook for tcc_constant as well. gcc/cp/ChangeLog: * ptree.c (cxx_print_xnode): Handle PTRMEM_CST. gcc/ChangeLog: * langhooks.h (struct lang_hooks): Adjust comment. * print-tree.c (print_node): Also call print_xnode hook for tcc_constant class. --- gcc/langhooks.h | 2 +- gcc/cp/ptree.c | 3 +++ gcc/print-tree.c | 3 +-- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/gcc/langhooks.h b/gcc/langhooks.h index 3e89134e8b4..3db8f2a550d 100644 --- a/gcc/langhooks.h +++ b/gcc/langhooks.h @@ -477,7 +477,7 @@ struct lang_hooks void (*print_statistics) (void); /* Called by print_tree when there is a tree of class tcc_exceptional - that it doesn't know how to display. */ + or tcc_constant that it doesn't know how to display. */ lang_print_tree_hook print_xnode; /* Called to print language-dependent parts of tcc_decl, tcc_type, diff --git a/gcc/cp/ptree.c b/gcc/cp/ptree.c index ca7884db39b..d514aa2cad2 100644 --- a/gcc/cp/ptree.c +++ b/gcc/cp/ptree.c @@ -379,6 +379,9 @@ cxx_print_xnode (FILE *file, tree node, int indent) if (tree message = STATIC_ASSERT_MESSAGE (node)) print_node (file, "message", message, indent+4); break; +case PTRMEM_CST: + print_node (file, "member", PTRMEM_CST_MEMBER (node), indent+4); + break; default: break; } diff --git a/gcc/print-tree.c b/gcc/print-tree.c index d1fbd044c27..b5dc523fcb1 100644 --- a/gcc/print-tree.c +++ b/gcc/print-tree.c @@ -1004,8 +1004,7 @@ print_node (FILE *file, const char *prefix, tree node, int indent, break; default: - if (EXCEPTIONAL_CLASS_P (node)) - lang_hooks.print_xnode (file, node, indent); + lang_hooks.print_xnode (file, node, indent); break; } -- 2.27.0
[committed] analyzer: fix overeager sharing of bounded_range instances [PR102662]
This was leading to an assertion failure ICE on a switch stmt when using -fstrict-enums, due to erroneously reusing a range involving one enum with a range involving a different enum. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r12-5307-ge1c0c908f85816240b685a5be4f0e5a0e6634979. gcc/analyzer/ChangeLog: PR analyzer/102662 * constraint-manager.cc (bounded_range::operator==): Require the types to be the same for equality. gcc/testsuite/ChangeLog: PR analyzer/102662 * g++.dg/analyzer/pr102662.C: New test. Signed-off-by: David Malcolm --- gcc/analyzer/constraint-manager.cc | 4 ++- gcc/testsuite/g++.dg/analyzer/pr102662.C | 39 2 files changed, 42 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/analyzer/pr102662.C diff --git a/gcc/analyzer/constraint-manager.cc b/gcc/analyzer/constraint-manager.cc index 6df23fb477e..ea6b5dc60e0 100644 --- a/gcc/analyzer/constraint-manager.cc +++ b/gcc/analyzer/constraint-manager.cc @@ -432,7 +432,9 @@ bounded_range::intersects_p (const bounded_range &other, bool bounded_range::operator== (const bounded_range &other) const { - return (tree_int_cst_equal (m_lower, other.m_lower) + return (TREE_TYPE (m_lower) == TREE_TYPE (other.m_lower) + && TREE_TYPE (m_upper) == TREE_TYPE (other.m_upper) + && tree_int_cst_equal (m_lower, other.m_lower) && tree_int_cst_equal (m_upper, other.m_upper)); } diff --git a/gcc/testsuite/g++.dg/analyzer/pr102662.C b/gcc/testsuite/g++.dg/analyzer/pr102662.C new file mode 100644 index 000..99252c7d109 --- /dev/null +++ b/gcc/testsuite/g++.dg/analyzer/pr102662.C @@ -0,0 +1,39 @@ +/* { dg-additional-options "-fstrict-enums" } */ + +enum OpCode { + OP_MOVE, + OP_LOADK, + OP_LOADBOOL, + OP_LOADNIL, + OP_GETUPVAL, + OP_SETUPVAL +}; + +enum OpArg { + OpArgN, + OpArgU, + OpArgR, + OpArgK +}; + +void +symbexec_lastpc (enum OpCode symbexec_lastpc_op, enum OpArg luaP_opmodes) +{ + switch (luaP_opmodes) +{ +case OpArgN: +case OpArgK: + { +switch (symbexec_lastpc_op) + { + case OP_LOADNIL: + case OP_SETUPVAL: +break; + default: +break; + } + } +default: + break; +} +} -- 2.26.3
Re: [PATCH 1/5] libstdc++: Import the fast_float library
On Tue, 16 Nov 2021, Florian Weimer wrote: > * Patrick Palka via Libstdc: > > > This copies the fast_float library[1] into the compiled-in library > > sources. We're going to use this library in our floating-point > > std::from_chars implementation for faster and more portable parsing of > > binary32/64 decimal strings. > > > > [1]: https://github.com/fastfloat/fast_float > > > > Series tested on x86_64, i686, ppc64, ppc64le and aarch64, does it > > look OK for trunk? > > Missing Signed-off-by:? Oops, fixed in the below patch. > > > diff --git a/libstdc++-v3/src/c++17/fast_float/LICENSE-APACHE > > b/libstdc++-v3/src/c++17/fast_float/LICENSE-APACHE > > new file mode 100644 > > index 000..26f4398f249 > > --- /dev/null > > +++ b/libstdc++-v3/src/c++17/fast_float/LICENSE-APACHE > > @@ -0,0 +1,190 @@ > > + Apache License > > + Version 2.0, January 2004 > > +http://www.apache.org/licenses/ > > > diff --git a/libstdc++-v3/src/c++17/fast_float/LICENSE-MIT > > b/libstdc++-v3/src/c++17/fast_float/LICENSE-MIT > > new file mode 100644 > > index 000..2fb2a37ad7f > > --- /dev/null > > +++ b/libstdc++-v3/src/c++17/fast_float/LICENSE-MIT > > @@ -0,0 +1,27 @@ > > +MIT License > > + > > +Copyright (c) 2021 The fast_float authors > > You also need to include the README file, which makes it clear that > recipients can choose between Apache and MIT. GCC needs to use the MIT > option, I think. Also fixed. I noticed that the source repository contains the script ./script/amalgamate.py that generates a single-file version of the library for us, complete with an embedded copyright/license banner. This seems like a simpler way of integrating the library, so the below patch uses the amalgamation instead. -- >8 -- Subject: [PATCH 1/5] libstdc++: Import the fast_float library We're going to use the fast_float library in our (compiled-in) floating-point std::from_chars implementation for faster and more portable parsing of binary32/64 decimal strings. The single file fast_float.h is an amalgamation of the entire library, which can be (re)generated with the command python3 ./script/amalgamate.py --license=MIT \ > $GCC_SRC/libstdc++-v3/c++17/fast_float/fast_float.h [1]: https://github.com/fastfloat/fast_float libstdc++-v3/ChangeLog: * src/c++17/fast_float/LOCAL_PATCHES: New file. * src/c++17/fast_float/MERGE: New file. * src/c++17/fast_float/README.fd: New file, copied from the fast_float library sources. * src/c++17/fast_float/fast_float.h: New file, an amalgamation of the fast_float library. Signed-off-by: Patrick Palka --- .../src/c++17/fast_float/LOCAL_PATCHES|0 libstdc++-v3/src/c++17/fast_float/MERGE |4 + libstdc++-v3/src/c++17/fast_float/README.md | 218 ++ .../src/c++17/fast_float/fast_float.h | 2944 + 4 files changed, 3166 insertions(+) create mode 100644 libstdc++-v3/src/c++17/fast_float/LOCAL_PATCHES create mode 100644 libstdc++-v3/src/c++17/fast_float/MERGE create mode 100644 libstdc++-v3/src/c++17/fast_float/README.md create mode 100644 libstdc++-v3/src/c++17/fast_float/fast_float.h diff --git a/libstdc++-v3/src/c++17/fast_float/LOCAL_PATCHES b/libstdc++-v3/src/c++17/fast_float/LOCAL_PATCHES new file mode 100644 index 000..e69de29bb2d diff --git a/libstdc++-v3/src/c++17/fast_float/MERGE b/libstdc++-v3/src/c++17/fast_float/MERGE new file mode 100644 index 000..43bdc3981c8 --- /dev/null +++ b/libstdc++-v3/src/c++17/fast_float/MERGE @@ -0,0 +1,4 @@ +d35368cae610b4edeec61cd41e4d2367a4d33f58 + +The first line of this file holds the git revision number of the +last merge done from the master library sources. diff --git a/libstdc++-v3/src/c++17/fast_float/README.md b/libstdc++-v3/src/c++17/fast_float/README.md new file mode 100644 index 000..1e1c06d0a3e --- /dev/null +++ b/libstdc++-v3/src/c++17/fast_float/README.md @@ -0,0 +1,218 @@ +## fast_float number parsing library: 4x faster than strtod + +/badge.svg) +/badge.svg) + + + +[](https://github.com/fastfloat/fast_float/actions/workflows/vs16-ci.yml) + +The fast_float library provides fast header-only implementations for the C++ from_chars +functions for `float` and `double` types. These functions convert ASCII strings representing +decimal values (e.g., `1.3e10`) into binary
[PATCH]middle-end: Fix FMA detection when inspecting gimple which have no LHS.
Hi All, convert_mult_to_fma assumes that all gimple_assigns have a LHS set. This assumption is however not true when an IFN is kept around just for the side-effects. In those situations you have just the IFN and lhs will be null. Since there's no LHS, there also can't be any ADD and such it can't be an FMA so it's correct to just return early if no LHS. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no regressions. Ok for master? Thanks, Tamar gcc/ChangeLog: PR tree-optimizations/103253 * tree-ssa-math-opts.c (convert_mult_to_fma): Check for LHS. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr103253.c: New test. --- inline copy of patch -- diff --git a/gcc/testsuite/gcc.dg/vect/pr103253.c b/gcc/testsuite/gcc.dg/vect/pr103253.c new file mode 100644 index ..abe3f09f3818d79a53f2aa962c6b6c06855d618e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr103253.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target fopenmp } */ +/* { dg-additional-options "-O2 -fexceptions -fopenmp -fno-delete-dead-exceptions -fno-trapping-math" } */ + +double +do_work (double do_work_pri) +{ + int i; + +#pragma omp simd + for (i = 0; i < 17; ++i) +do_work_pri = (!i ? 0.5 : i) * 2.0; + + return do_work_pri; +} + diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index c4a6492b50df25b4cf296a75bd51e5af34eeacc7..cc8496c3c325f3cc303a90b9b9cac383e5a7942d 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -3224,6 +3224,10 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2, fma_deferring_state *state, tree mul_cond = NULL_TREE) { tree mul_result = gimple_get_lhs (mul_stmt); + /* If there isn't a LHS then this can't be an FMA. There can be no LHS + if the statement was left just for the side-effects. */ + if (!mul_result) +return false; tree type = TREE_TYPE (mul_result); gimple *use_stmt, *neguse_stmt; use_operand_p use_p; -- diff --git a/gcc/testsuite/gcc.dg/vect/pr103253.c b/gcc/testsuite/gcc.dg/vect/pr103253.c new file mode 100644 index ..abe3f09f3818d79a53f2aa962c6b6c06855d618e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr103253.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target fopenmp } */ +/* { dg-additional-options "-O2 -fexceptions -fopenmp -fno-delete-dead-exceptions -fno-trapping-math" } */ + +double +do_work (double do_work_pri) +{ + int i; + +#pragma omp simd + for (i = 0; i < 17; ++i) +do_work_pri = (!i ? 0.5 : i) * 2.0; + + return do_work_pri; +} + diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index c4a6492b50df25b4cf296a75bd51e5af34eeacc7..cc8496c3c325f3cc303a90b9b9cac383e5a7942d 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -3224,6 +3224,10 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2, fma_deferring_state *state, tree mul_cond = NULL_TREE) { tree mul_result = gimple_get_lhs (mul_stmt); + /* If there isn't a LHS then this can't be an FMA. There can be no LHS + if the statement was left just for the side-effects. */ + if (!mul_result) +return false; tree type = TREE_TYPE (mul_result); gimple *use_stmt, *neguse_stmt; use_operand_p use_p;
[PATCH][committed]AArch64 shrn-combine-10: update test to current codegen.
Hi All, When the rshrn commit was reverted I missed this testcase. This now updates it. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Committed under the obvious rule. Thanks, Tamar gcc/testsuite/ChangeLog: * gcc.target/aarch64/shrn-combine-10.c: Use shrn. --- inline copy of patch -- diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c b/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c index 3a1cfce93e9065e8d5b43a770b0ef24a17586411..dc9e9be94cbe4ba81d936dfaf178674b9da31040 100644 --- a/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c @@ -6,7 +6,7 @@ uint32x4_t foo (uint64x2_t a, uint64x2_t b) { - return vrshrn_high_n_u64 (vrshrn_n_u64 (a, 32), b, 32); + return vshrn_high_n_u64 (vshrn_n_u64 (a, 32), b, 32); } /* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */ -- diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c b/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c index 3a1cfce93e9065e8d5b43a770b0ef24a17586411..dc9e9be94cbe4ba81d936dfaf178674b9da31040 100644 --- a/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c @@ -6,7 +6,7 @@ uint32x4_t foo (uint64x2_t a, uint64x2_t b) { - return vrshrn_high_n_u64 (vrshrn_n_u64 (a, 32), b, 32); + return vshrn_high_n_u64 (vshrn_n_u64 (a, 32), b, 32); } /* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */
[PATCH][committed]middle-end signbit-2: make test check for scalar or vector versions
Hi All, This updates the signbit-2 test to check for the scalar optimization if the target does not support vectorization. Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no regressions. Committed under the gcc obvious rule. Thanks, Tamar gcc/testsuite/ChangeLog: * gcc.dg/signbit-2.c: CHeck vect or scalar. --- inline copy of patch -- diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c index d8501e9b7a2d82b511ad0b3a44c0121d635972c0..b609f67dc9f8a949b86f0ec84144db834b9d531a 100644 --- a/gcc/testsuite/gcc.dg/signbit-2.c +++ b/gcc/testsuite/gcc.dg/signbit-2.c @@ -19,5 +19,6 @@ void fun2(int32_t *x, int n) x[i] = (-x[i]) >> 30; } -/* { dg-final { scan-tree-dump {\s+>\s+\{ 0, 0, 0(, 0)+ \}} optimized } } */ +/* { dg-final { scan-tree-dump {\s+>\s+\{ 0(, 0)+ \}} optimized { target vect_int } } } */ +/* { dg-final { scan-tree-dump {\s+>\s+0} optimized { target { ! vect_int } } } } */ /* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */ -- diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c index d8501e9b7a2d82b511ad0b3a44c0121d635972c0..b609f67dc9f8a949b86f0ec84144db834b9d531a 100644 --- a/gcc/testsuite/gcc.dg/signbit-2.c +++ b/gcc/testsuite/gcc.dg/signbit-2.c @@ -19,5 +19,6 @@ void fun2(int32_t *x, int n) x[i] = (-x[i]) >> 30; } -/* { dg-final { scan-tree-dump {\s+>\s+\{ 0, 0, 0(, 0)+ \}} optimized } } */ +/* { dg-final { scan-tree-dump {\s+>\s+\{ 0(, 0)+ \}} optimized { target vect_int } } } */ +/* { dg-final { scan-tree-dump {\s+>\s+0} optimized { target { ! vect_int } } } } */ /* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */
Re: [PATCH 2/5] gimple-match: Add a gimple_extract_op function
Richard Biener writes: > On Wed, Nov 10, 2021 at 1:46 PM Richard Sandiford via Gcc-patches > wrote: >> >> code_helper and gimple_match_op seem like generally useful ways >> of summing up a gimple_assign or gimple_call (or gimple_cond). >> This patch adds a gimple_extract_op function that can be used >> for that. >> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> Richard >> >> >> gcc/ >> * gimple-match.h (gimple_extract_op): Declare. >> * gimple-match.c (gimple_extract): New function, extracted from... >> (gimple_simplify): ...here. >> (gimple_extract_op): New function. >> --- >> gcc/gimple-match-head.c | 261 +++- >> gcc/gimple-match.h | 1 + >> 2 files changed, 149 insertions(+), 113 deletions(-) >> >> diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c >> index 9d88b2f8551..4c6e0883ba4 100644 >> --- a/gcc/gimple-match-head.c >> +++ b/gcc/gimple-match-head.c >> @@ -890,12 +890,29 @@ try_conditional_simplification (internal_fn ifn, >> gimple_match_op *res_op, >>return true; >> } >> >> -/* The main STMT based simplification entry. It is used by the fold_stmt >> - and the fold_stmt_to_constant APIs. */ >> +/* Common subroutine of gimple_extract_op and gimple_simplify. Try to >> + describe STMT in RES_OP. Return: >> >> -bool >> -gimple_simplify (gimple *stmt, gimple_match_op *res_op, gimple_seq *seq, >> -tree (*valueize)(tree), tree (*top_valueize)(tree)) >> + - -1 if extraction failed >> + - otherwise, 0 if no simplification should take place >> + - otherwise, the number of operands for a GIMPLE_ASSIGN or GIMPLE_COND >> + - otherwise, -2 for a GIMPLE_CALL >> + >> + Before recording an operand, call: >> + >> + - VALUEIZE_CONDITION for a COND_EXPR condition >> + - VALUEIZE_NAME if the rhs of a GIMPLE_ASSIGN is an SSA_NAME > > I think at least VALUEIZE_NAME is unnecessary, see below Yeah, it's unnecessary. The idea was to (try to) ensure that gimple_simplify keeps all the microoptimisations that it had previously. This includes open-coding do_valueize for SSA_NAMEs and jumping straight to the right gimplify_resimplifyN routine when the number of operands is already known. (The two calls to gimple_extract<> produce different functions that ought to get inlined into their single callers. A lot of the jumps should then be threaded.) I can drop all that if you don't think it's worth it though. Just wanted to double-check first. Thanks, Richard >> + - VALUEIZE_OP for every other top-level operand >> + >> + Each routine takes a tree argument and returns a tree. */ >> + >> +template> +typename ValueizeName> >> +inline int >> +gimple_extract (gimple *stmt, gimple_match_op *res_op, >> + ValueizeOp valueize_op, >> + ValueizeCondition valueize_condition, >> + ValueizeName valueize_name) >> { >>switch (gimple_code (stmt)) >> { >> @@ -911,100 +928,53 @@ gimple_simplify (gimple *stmt, gimple_match_op >> *res_op, gimple_seq *seq, >> || code == VIEW_CONVERT_EXPR) >> { >> tree op0 = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0); >> - bool valueized = false; >> - op0 = do_valueize (op0, top_valueize, valueized); >> - res_op->set_op (code, type, op0); >> - return (gimple_resimplify1 (seq, res_op, valueize) >> - || valueized); >> + res_op->set_op (code, type, valueize_op (op0)); >> + return 1; >> } >> else if (code == BIT_FIELD_REF) >> { >> tree rhs1 = gimple_assign_rhs1 (stmt); >> - tree op0 = TREE_OPERAND (rhs1, 0); >> - bool valueized = false; >> - op0 = do_valueize (op0, top_valueize, valueized); >> + tree op0 = valueize_op (TREE_OPERAND (rhs1, 0)); >> res_op->set_op (code, type, op0, >> TREE_OPERAND (rhs1, 1), >> TREE_OPERAND (rhs1, 2), >> REF_REVERSE_STORAGE_ORDER (rhs1)); >> - if (res_op->reverse) >> - return valueized; >> - return (gimple_resimplify3 (seq, res_op, valueize) >> - || valueized); >> + return res_op->reverse ? 0 : 3; >> } >> - else if (code == SSA_NAME >> -&& top_valueize) >> + else if (code == SSA_NAME) >> { >> tree op0 = gimple_assign_rhs1 (stmt); >> - tree valueized = top_valueize (op0); >> + tree valueized = valueize_name (op0); >> if (!valueized || op0 == valueized) >> - return false; >> + return -1; >> res_op->set_op (TREE_CODE (op0), type, valueized); >> -
Re: [PATCH 1/5] libstdc++: Import the fast_float library
Am Di., 16. Nov. 2021 um 16:31 Uhr schrieb Patrick Palka via Libstdc++ : > [..] > -- >8 -- > > Subject: [PATCH 1/5] libstdc++: Import the fast_float library > [..] > +## Reference > + > +- Daniel Lemire, [Number Parsing at a Gigabyte per > Second](https://arxiv.org/abs/2101.11408), Software: Pratice and Experience > 51 (8), 2021. There is a typo in the title at the very end: s/Pratice/Practice (See https://arxiv.org/abs/2101.11408) - Daniel
Re: [PATCH 4/5] vect: Make reduction code handle calls
Richard Biener via Gcc-patches writes: > On Wed, Nov 10, 2021 at 1:48 PM Richard Sandiford via Gcc-patches > wrote: >> >> This patch extends the reduction code to handle calls. So far >> it's a structural change only; a later patch adds support for >> specific function reductions. >> >> Most of the patch consists of using code_helper and gimple_match_op >> to describe the reduction operations. The other main change is that >> vectorizable_call now needs to handle fully-predicated reductions. >> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> Richard >> >> >> gcc/ >> * builtins.h (associated_internal_fn): Declare overload that >> takes a (combined_cfn, return type) pair. >> * builtins.c (associated_internal_fn): Split new overload out >> of original fndecl version. Also provide an overload that takes >> a (combined_cfn, return type) pair. >> * internal-fn.h (commutative_binary_fn_p): Declare. >> (associative_binary_fn_p): Likewise. >> * internal-fn.c (commutative_binary_fn_p): New function, >> split out from... >> (first_commutative_argument): ...here. >> (associative_binary_fn_p): New function. >> * gimple-match.h (code_helper): Add a constructor that takes >> internal functions. >> (commutative_binary_op_p): Declare. >> (associative_binary_op_p): Likewise. >> (canonicalize_code): Likewise. >> (directly_supported_p): Likewise. >> (get_conditional_internal_fn): Likewise. >> (gimple_build): New overload that takes a code_helper. >> * gimple-fold.c (gimple_build): Likewise. >> * gimple-match-head.c (commutative_binary_op_p): New function. >> (associative_binary_op_p): Likewise. >> (canonicalize_code): Likewise. >> (directly_supported_p): Likewise. >> (get_conditional_internal_fn): Likewise. >> * tree-vectorizer.h: Include gimple-match.h. >> (neutral_op_for_reduction): Take a code_helper instead of a >> tree_code. >> (needs_fold_left_reduction_p): Likewise. >> (reduction_fn_for_scalar_code): Likewise. >> (vect_can_vectorize_without_simd_p): Declare a nNew overload that >> takes >> a code_helper. >> * tree-vect-loop.c: Include case-cfn-macros.h. >> (fold_left_reduction_fn): Take a code_helper instead of a tree_code. >> (reduction_fn_for_scalar_code): Likewise. >> (neutral_op_for_reduction): Likewise. >> (needs_fold_left_reduction_p): Likewise. >> (use_mask_by_cond_expr_p): Likewise. >> (build_vect_cond_expr): Likewise. >> (vect_create_partial_epilog): Likewise. Use gimple_build rather >> than gimple_build_assign. >> (check_reduction_path): Handle calls and operate on code_helpers >> rather than tree_codes. >> (vect_is_simple_reduction): Likewise. >> (vect_model_reduction_cost): Likewise. >> (vect_find_reusable_accumulator): Likewise. >> (vect_create_epilog_for_reduction): Likewise. >> (vect_transform_cycle_phi): Likewise. >> (vectorizable_reduction): Likewise. Make more use of >> lane_reduc_code_p. >> (vect_transform_reduction): Use gimple_extract_op but expect >> a tree_code for now. >> (vect_can_vectorize_without_simd_p): New overload that takes >> a code_helper. >> * tree-vect-stmts.c (vectorizable_call): Handle reductions in >> fully-masked loops. >> * tree-vect-patterns.c (vect_mark_pattern_stmts): Use >> gimple_extract_op when updating STMT_VINFO_REDUC_IDX. >> --- >> gcc/builtins.c | 46 - >> gcc/builtins.h | 1 + >> gcc/gimple-fold.c| 9 + >> gcc/gimple-match-head.c | 70 +++ >> gcc/gimple-match.h | 20 ++ >> gcc/internal-fn.c| 46 - >> gcc/internal-fn.h| 2 + >> gcc/tree-vect-loop.c | 420 +++ >> gcc/tree-vect-patterns.c | 23 ++- >> gcc/tree-vect-stmts.c| 66 -- >> gcc/tree-vectorizer.h| 10 +- >> 11 files changed, 455 insertions(+), 258 deletions(-) >> >> diff --git a/gcc/builtins.c b/gcc/builtins.c >> index 384864bfb3a..03829c03a5a 100644 >> --- a/gcc/builtins.c >> +++ b/gcc/builtins.c >> @@ -2139,17 +2139,17 @@ mathfn_built_in_type (combined_fn fn) >> #undef SEQ_OF_CASE_MATHFN >> } >> >> -/* If BUILT_IN_NORMAL function FNDECL has an associated internal function, >> - return its code, otherwise return IFN_LAST. Note that this function >> - only tests whether the function is defined in internals.def, not whether >> - it is actually available on the target. */ >> +/* Check whether there is an internal function associated with function FN >> + and return type RETURN_TYPE. Return the function if so, otherwise return >> + IFN_LAST. >> >> -internal_fn >> -associated_internal_fn (tre
Re: [musl] Re: [PATCH v2] configure: define TARGET_LIBC_GNUSTACK on musl
On Tue, Nov 16, 2021 at 03:40:00PM +0100, Dragan Mladjenovic wrote: > Hi, > > Looks fine to me. If possible, maybe it should even be back-ported > to stable branches. > > Not sure if MIPS assembly sources (if any) in musl would need > explicit ..note.GNU-stack > > to complement this? What are the actual consequences of making this change, and what is the goal? I'm concerned that it might produce object files which don't include annotation that they don't need executable stack, in which case the final executable file will be marked as executable-stack and the kernel will load it as such. That would be very bad. Rich > On 16-Nov-21 06:13, Ilya Lipnitskiy wrote: > >musl only uses PT_GNU_STACK to set default thread stack size and has no > >executable stack support[0], so there is no reason not to emit the > >.note.GNU-stack section on musl builds. > > > >[0]: > >https://lore.kernel.org/all/20190423192534.gn23...@brightrain.aerifal.cx/T/#u > > > >gcc/ChangeLog: > > > > * configure: Regenerate. > > * configure.ac: define TARGET_LIBC_GNUSTACK on musl > > > >Signed-off-by: Ilya Lipnitskiy > >--- > > gcc/configure| 3 +++ > > gcc/configure.ac | 3 +++ > > 2 files changed, 6 insertions(+) > > > >diff --git a/gcc/configure b/gcc/configure > >index 74b9d9be4c85..7091a838aefa 100755 > >--- a/gcc/configure > >+++ b/gcc/configure > >@@ -31275,6 +31275,9 @@ fi > > # Check if the target LIBC handles PT_GNU_STACK. > > gcc_cv_libc_gnustack=unknown > > case "$target" in > >+ mips*-*-linux-musl*) > >+gcc_cv_libc_gnustack=yes > >+;; > >mips*-*-linux*) > > if test $glibc_version_major -gt 2 \ > >diff --git a/gcc/configure.ac b/gcc/configure.ac > >index c9ee1fb8919e..8a2d34179a75 100644 > >--- a/gcc/configure.ac > >+++ b/gcc/configure.ac > >@@ -6961,6 +6961,9 @@ fi > > # Check if the target LIBC handles PT_GNU_STACK. > > gcc_cv_libc_gnustack=unknown > > case "$target" in > >+ mips*-*-linux-musl*) > >+gcc_cv_libc_gnustack=yes > >+;; > >mips*-*-linux*) > > GCC_GLIBC_VERSION_GTE_IFELSE([2], [31], [gcc_cv_libc_gnustack=yes], ) > > ;;
Re: [PATCH] simplify get_range_strlen interface
On 11/15/21 3:05 PM, Martin Sebor wrote: The deeply nested PHI handling in get_range_strlen_dynamic makes the code bigger and harder to follow than it would be if done in its own function. The attached patch does that. In addition, the get_range_strlen family of functions use a bitmap to avoid infinite recursion. Rather than dynamically allocating and freeing it on demand the attached patch simplifies the code by using an instance of auto_bitmap. This avoids the risk of neglecting to deallocate the bitmap. I forgot over the weekend that this change also fixes a bug: PR 102960. I have committed the fix in r12-5310 along with a test. Martin Tested on x86_64-linux. Martin
[committed 1/2] libstdc++: Use hidden friends for vector::reference swap overloads
Tested x86_64-linux, committed to trunk. These swap overloads are non-standard, but are needed to make swap work for vector::reference rvalues. They don't need to be called explicitly, only via ADL, so hide them from normal lookup. This is what I've proposed as the resolution to LWG 3638. libstdc++-v3/ChangeLog: * include/bits/stl_bvector.h (swap(_Bit_reference, _Bit_reference)) (swap(_Bit_reference, bool&), swap(bool&, _Bit_reference)): Define as hidden friends of _Bit_reference. --- libstdc++-v3/include/bits/stl_bvector.h | 50 - 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h index 381c47b6132..68070685baf 100644 --- a/libstdc++-v3/include/bits/stl_bvector.h +++ b/libstdc++-v3/include/bits/stl_bvector.h @@ -125,36 +125,36 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void flip() _GLIBCXX_NOEXCEPT { *_M_p ^= _M_mask; } - }; #if __cplusplus >= 201103L - _GLIBCXX20_CONSTEXPR - inline void - swap(_Bit_reference __x, _Bit_reference __y) noexcept - { -bool __tmp = __x; -__x = __y; -__y = __tmp; - } +_GLIBCXX20_CONSTEXPR +friend void +swap(_Bit_reference __x, _Bit_reference __y) noexcept +{ + bool __tmp = __x; + __x = __y; + __y = __tmp; +} - _GLIBCXX20_CONSTEXPR - inline void - swap(_Bit_reference __x, bool& __y) noexcept - { -bool __tmp = __x; -__x = __y; -__y = __tmp; - } +_GLIBCXX20_CONSTEXPR +friend void +swap(_Bit_reference __x, bool& __y) noexcept +{ + bool __tmp = __x; + __x = __y; + __y = __tmp; +} - _GLIBCXX20_CONSTEXPR - inline void - swap(bool& __x, _Bit_reference __y) noexcept - { -bool __tmp = __x; -__x = __y; -__y = __tmp; - } +_GLIBCXX20_CONSTEXPR +friend void +swap(bool& __x, _Bit_reference __y) noexcept +{ + bool __tmp = __x; + __x = __y; + __y = __tmp; +} #endif + }; struct _Bit_iterator_base : public std::iterator -- 2.31.1
[committed 2/2] libstdc++: Implement constexpr std::basic_string for C++20
From: Michael de Lang Tested x86_64-linux, committed to trunk. This is only supported for the cxx11 ABI, not for COW strings. libstdc++-v3/ChangeLog: * include/bits/basic_string.h (basic_string, operator""s): Add constexpr for C++20. (basic_string::basic_string(basic_string&&)): Only copy initialized portion of the buffer. (basic_string::basic_string(basic_string&&, const Alloc&)): Likewise. * include/bits/basic_string.tcc (basic_string): Add constexpr for C++20. (basic_string::swap(basic_string&)): Only copy initialized portions of the buffers. (basic_string::_M_replace): Add constexpr implementation that doesn't depend on pointer comparisons. * include/bits/cow_string.h: Adjust comment. * include/ext/type_traits.h (__is_null_pointer): Add constexpr. * include/std/string (erase, erase_if): Add constexpr. * include/std/version (__cpp_lib_constexpr_string): Update value. * testsuite/21_strings/basic_string/cons/char/constexpr.cc: New test. * testsuite/21_strings/basic_string/cons/wchar_t/constexpr.cc: New test. * testsuite/21_strings/basic_string/literals/constexpr.cc: New test. * testsuite/21_strings/basic_string/modifiers/constexpr.cc: New test. * testsuite/21_strings/basic_string/modifiers/swap/char/constexpr.cc: New test. * testsuite/21_strings/basic_string/modifiers/swap/wchar_t/constexpr.cc: New test. * testsuite/21_strings/basic_string/version.cc: New test. --- libstdc++-v3/include/bits/basic_string.h | 274 -- libstdc++-v3/include/bits/basic_string.tcc| 69 - libstdc++-v3/include/bits/cow_string.h| 2 +- libstdc++-v3/include/ext/type_traits.h| 4 +- libstdc++-v3/include/std/string | 2 + libstdc++-v3/include/std/version | 6 +- .../basic_string/cons/char/constexpr.cc | 174 +++ .../basic_string/cons/wchar_t/constexpr.cc| 174 +++ .../basic_string/literals/constexpr.cc| 22 ++ .../basic_string/modifiers/constexpr.cc | 52 .../modifiers/swap/char/constexpr.cc | 49 .../modifiers/swap/wchar_t/constexpr.cc | 49 .../21_strings/basic_string/version.cc| 25 ++ 13 files changed, 869 insertions(+), 33 deletions(-) create mode 100644 libstdc++-v3/testsuite/21_strings/basic_string/cons/char/constexpr.cc create mode 100644 libstdc++-v3/testsuite/21_strings/basic_string/cons/wchar_t/constexpr.cc create mode 100644 libstdc++-v3/testsuite/21_strings/basic_string/literals/constexpr.cc create mode 100644 libstdc++-v3/testsuite/21_strings/basic_string/modifiers/constexpr.cc create mode 100644 libstdc++-v3/testsuite/21_strings/basic_string/modifiers/swap/char/constexpr.cc create mode 100644 libstdc++-v3/testsuite/21_strings/basic_string/modifiers/swap/wchar_t/constexpr.cc create mode 100644 libstdc++-v3/testsuite/21_strings/basic_string/version.cc diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index a6575fa9e26..b6945f1cdfb 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -57,12 +57,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _GLIBCXX_BEGIN_NAMESPACE_CXX11 #ifdef __cpp_lib_is_constant_evaluated -// Support P1032R1 in C++20 (but not P0980R1 yet). -# define __cpp_lib_constexpr_string 201811L +// Support P0980R1 in C++20. +# define __cpp_lib_constexpr_string 201907L #elif __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED // Support P0426R1 changes to char_traits in C++17. # define __cpp_lib_constexpr_string 201611L -#elif __cplusplus > 201703L #endif /** @@ -131,6 +130,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 _Res>; // Allows an implicit conversion to __sv_type. + _GLIBCXX20_CONSTEXPR static __sv_type _S_to_string_view(__sv_type __svt) noexcept { return __svt; } @@ -141,7 +141,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 // is provided. struct __sv_wrapper { - explicit __sv_wrapper(__sv_type __sv) noexcept : _M_sv(__sv) { } + _GLIBCXX20_CONSTEXPR explicit + __sv_wrapper(__sv_type __sv) noexcept : _M_sv(__sv) { } + __sv_type _M_sv; }; @@ -151,6 +153,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 * @param __svw string view wrapper. * @param __a Allocator to use. */ + _GLIBCXX20_CONSTEXPR explicit basic_string(__sv_wrapper __svw, const _Alloc& __a) : basic_string(__svw._M_sv.data(), __svw._M_sv.size(), __a) { } @@ -163,9 +166,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 _Alloc_hider(pointer __dat, const _Alloc& __a = _Alloc()) : allocator_type(__a), _M_p(__dat) { } #else + _GLIBCXX20_CONSTEXPR _Alloc_hider(pointer __dat, const _Al
[PATCH] Do not abort compilation when dump file is /dev/*
The `configure` scripts generated with autoconf often tests compiler features by setting output to `/dev/null`, which then sets the dump folder as being /dev/* and the compilation halts with an error because GCC cannot create files in /dev/. This is a problem when configure is testing for compiler features because it cannot tell if the failure was due to unsupported features or any other problem, and disable it even if it is working. As an example, running configure overriding CFLAGS="-fdump-ipa-clones" will result in several compiler-features as being disabled because of gcc halting with an error creating files in /dev/*. This commit fixes this issue by checking if the dump folder is /dev/. If yes, then it just informs the user and disables dumping, but does not halt the compilation and the compiler retuns 0 to the shell. gcc/ChangeLog 2021-11-16 Giuliano Belinassi * dumpfile.c (dump_open): Do not halt compilation when file matches /dev/*. gcc/testsuite/ChangeLog 2021-11-16 Giuliano Belinassi * gcc.dg/devnull-dump.c: New. Signed-off-by: Giuliano Belinassi --- gcc/dumpfile.c | 17 - gcc/testsuite/gcc.dg/devnull-dump.c | 7 +++ 2 files changed, 23 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/devnull-dump.c diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c index 8169daf7f59..b1dbfb371af 100644 --- a/gcc/dumpfile.c +++ b/gcc/dumpfile.c @@ -378,7 +378,22 @@ dump_open (const char *filename, bool trunc) FILE *stream = fopen (filename, trunc ? "w" : "a"); if (!stream) -error ("could not open dump file %qs: %m", filename); +{ + /* Autoconf tests compiler functionalities by setting output to /dev/null. +In this case, if dumps are enabled, it will try to set the output +folder to /dev/*, which is of course invalid and the compiler will exit +with an error, resulting in configure script reporting the tested +feature as being unavailable. Here we test this case by checking if the +output file prefix has /dev/ and only inform the user in this case +rather than refusing to compile. */ + + const char *const slash_dev = "/dev/"; + if (strncmp(slash_dev, filename, strlen(slash_dev)) == 0) + inform (UNKNOWN_LOCATION, + "could not open dump file %qs: %m. Dumps are disabled.", filename); + else + error ("could not open dump file %qs: %m", filename); +} return stream; } diff --git a/gcc/testsuite/gcc.dg/devnull-dump.c b/gcc/testsuite/gcc.dg/devnull-dump.c new file mode 100644 index 000..378e0901c28 --- /dev/null +++ b/gcc/testsuite/gcc.dg/devnull-dump.c @@ -0,0 +1,7 @@ +/* { dg-do assemble } */ +/* { dg-options "-fdump-ipa-clones -o /dev/null" } */ + +int main() +{ + return 0; +} -- 2.33.1
[PATCH RFC] c-family: don't cache large vecs
Patrick observed recently that an element of the vector cache could be arbitrarily large. Let's only cache relatively small vecs. This has no effect on compiling the libstdc++ stdc++.h, presumably because nothing in the library requires a vec that large. I figure that this makes it more likely that a subsequent long list will reuse the same memory when the later vec gets expanded. Does this make sense to others? gcc/c-family/ChangeLog: * c-common.c (release_tree_vector): Only cache vecs smaller than 16 elements. --- gcc/c-family/c-common.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 436df45df68..90e8ec87b6b 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -8213,8 +8213,16 @@ release_tree_vector (vec *vec) { if (vec != NULL) { - vec->truncate (0); - vec_safe_push (tree_vector_cache, vec); + if (vec->allocated () >= 16) + /* Don't cache vecs that have expanded more than once. On a p64 + target, vecs double in alloc size with each power of 2 elements, e.g + at 16 elements the alloc increases from 128 to 256 bytes. */ + vec_free (vec); + else + { + vec->truncate (0); + vec_safe_push (tree_vector_cache, vec); + } } } base-commit: 132f1c27770fa6dafdf14591878d301aedd5ae16 -- 2.27.0
Re: [committed 2/2] libstdc++: Implement constexpr std::basic_string for C++20
Oops, the subject line was not supposed to say 2/2 for this commit, and I was not supposed to have Michael de Lang as the author ... I messed up my git send-email and git cherry-pick commands! Sorry Michael, I originally tried to use your tests from https://github.com/Oipo/gcc/ but as noted in https://gcc.gnu.org/PR93989 those tests are incorrect, and so I didn't actually use any of them (nor the std::string code itself). But apparently the commit still had you as the author, because I reset the content of the git tree, but not the commit author. I'll fix that in GCC's ChangeLog file after it regenerates overnight. On Tue, 16 Nov 2021 at 16:47, Jonathan Wakely wrote: > From: Michael de Lang > > Tested x86_64-linux, committed to trunk. > > > This is only supported for the cxx11 ABI, not for COW strings. > > libstdc++-v3/ChangeLog: > > * include/bits/basic_string.h (basic_string, operator""s): Add > constexpr for C++20. > (basic_string::basic_string(basic_string&&)): Only copy > initialized portion of the buffer. > (basic_string::basic_string(basic_string&&, const Alloc&)): > Likewise. > * include/bits/basic_string.tcc (basic_string): Add constexpr > for C++20. > (basic_string::swap(basic_string&)): Only copy initialized > portions of the buffers. > (basic_string::_M_replace): Add constexpr implementation that > doesn't depend on pointer comparisons. > * include/bits/cow_string.h: Adjust comment. > * include/ext/type_traits.h (__is_null_pointer): Add constexpr. > * include/std/string (erase, erase_if): Add constexpr. > * include/std/version (__cpp_lib_constexpr_string): Update > value. > * testsuite/21_strings/basic_string/cons/char/constexpr.cc: > New test. > * testsuite/21_strings/basic_string/cons/wchar_t/constexpr.cc: > New test. > * testsuite/21_strings/basic_string/literals/constexpr.cc: > New test. > * testsuite/21_strings/basic_string/modifiers/constexpr.cc: New > test. > * > testsuite/21_strings/basic_string/modifiers/swap/char/constexpr.cc: > New test. > * > testsuite/21_strings/basic_string/modifiers/swap/wchar_t/constexpr.cc: > New test. > * testsuite/21_strings/basic_string/version.cc: New test. > > >
Re: [PATCH 06/15] visium: Fix non-robust split condition in define_insn_and_split
> gcc/ChangeLog: > > * config/visium/visium.md (*add3_insn, *addsi3_insn, *addi3_insn, > *sub3_insn, *subsi3_insn, *subdi3_insn, *neg2_insn, > *negdi2_insn, *and3_insn, *ior3_insn, *xor3_insn, > *one_cmpl2_insn, *ashl3_insn, *ashr3_insn, > *lshr3_insn, *trunchiqi2_insn, *truncsihi2_insn, > *truncdisi2_insn, *extendqihi2_insn, *extendqisi2_insn, > *extendhisi2_insn, *extendsidi2_insn, *zero_extendqihi2_insn, >*zero_extendqisi2_insn, *zero_extendsidi2_insn): Fix split condition. OK for mainline, thanks. -- Eric Botcazou
Re: [PATCH RFC] c-family: don't cache large vecs
On Tue, Nov 16, 2021 at 11:53:14AM -0500, Jason Merrill via Gcc-patches wrote: > Patrick observed recently that an element of the vector cache could be > arbitrarily large. Let's only cache relatively small vecs. > > This has no effect on compiling the libstdc++ stdc++.h, presumably because > nothing in the library requires a vec that large. I figure that this makes it > more likely that a subsequent long list will reuse the same memory when the > later vec gets expanded. > > Does this make sense to others? Looks good to me. > gcc/c-family/ChangeLog: > > * c-common.c (release_tree_vector): Only cache vecs smaller than > 16 elements. > --- > gcc/c-family/c-common.c | 12 ++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c > index 436df45df68..90e8ec87b6b 100644 > --- a/gcc/c-family/c-common.c > +++ b/gcc/c-family/c-common.c > @@ -8213,8 +8213,16 @@ release_tree_vector (vec *vec) > { >if (vec != NULL) > { > - vec->truncate (0); > - vec_safe_push (tree_vector_cache, vec); > + if (vec->allocated () >= 16) > + /* Don't cache vecs that have expanded more than once. On a p64 > +target, vecs double in alloc size with each power of 2 elements, e.g > +at 16 elements the alloc increases from 128 to 256 bytes. */ > + vec_free (vec); > + else > + { > + vec->truncate (0); > + vec_safe_push (tree_vector_cache, vec); > + } > } > } > > > base-commit: 132f1c27770fa6dafdf14591878d301aedd5ae16 > -- > 2.27.0 > Marek
[PATCH] rs6000: Add [power6-64] stanza to new builtin support
Hi! While reviewing the recent 32-bit changes for the new builtin infrastructure, I realized that I needed another stanza to represent builtins requiring both -mcpu=power6 and -mpowerpc64. (There's only one of these, but nonetheless...) So this patch adds that support in the same fashion as [power7-64] and [power9-64]. Bootstrapped and tested on powerpc64le-linux-gnu, and on powerpc64-linux-gnu with -m32/-m64. Is this okay for trunk? Thanks! Bill 2021-11-16 Bill Schmidt gcc/ * config/rs6000/rs6000-builtin-new.def: Add power6-64 stanza. Move CMPB to power6-64 stanza. * config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin): Handle ENB_P6_64 case. (rs6000_new_builtin_is_supported): Likewise. (rs6000_expand_new_builtin): Likewise. (rs6000_init_builtins): Likewise. * config/rs6000/rs6000-gen-builtins.c (bif_stanza): Add BSTZ_P6_64. (stanza_map): Add entry mapping power6-64 to BSTZ_P6_64. (enable_string): Add "ENB_P6_64". (write_decls): Add ENB_P6_64 to bif_enable enum. --- gcc/config/rs6000/rs6000-builtin-new.def | 9 ++--- gcc/config/rs6000/rs6000-call.c | 10 ++ gcc/config/rs6000/rs6000-gen-builtins.c | 4 3 files changed, 20 insertions(+), 3 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtin-new.def b/gcc/config/rs6000/rs6000-builtin-new.def index 1dd8f6b40b2..58dfce1ca37 100644 --- a/gcc/config/rs6000/rs6000-builtin-new.def +++ b/gcc/config/rs6000/rs6000-builtin-new.def @@ -266,13 +266,16 @@ ; Power6 builtins (ISA 2.05). [power6] - const signed long __builtin_p6_cmpb (signed long, signed long); -CMPB cmpbdi3 {} - const signed int __builtin_p6_cmpb_32 (signed int, signed int); CMPB_32 cmpbsi3 {} +; Power6 builtins requiring 64-bit GPRs (even with 32-bit addressing). +[power6-64] + const signed long __builtin_p6_cmpb (signed long, signed long); +CMPB cmpbdi3 {} + + ; AltiVec builtins. [altivec] const vsc __builtin_altivec_abs_v16qi (vsc); diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index 83e1abb6118..822a9736591 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -11919,6 +11919,10 @@ rs6000_invalid_new_builtin (enum rs6000_gen_builtins fncode) case ENB_P6: error ("%qs requires the %qs option", name, "-mcpu=power6"); break; +case ENB_P6_64: + error ("%qs requires the %qs option and either the %qs or %qs option", +name, "-mcpu=power6", "-m64", "-mpowerpc64"); + break; case ENB_ALTIVEC: error ("%qs requires the %qs option", name, "-maltivec"); break; @@ -13346,6 +13350,8 @@ rs6000_new_builtin_is_supported (enum rs6000_gen_builtins fncode) return TARGET_POPCNTB; case ENB_P6: return TARGET_CMPB; +case ENB_P6_64: + return TARGET_CMPB && TARGET_POWERPC64; case ENB_P7: return TARGET_POPCNTD; case ENB_P7_64: @@ -15697,6 +15703,8 @@ rs6000_expand_new_builtin (tree exp, rtx target, if (!(e == ENB_ALWAYS || (e == ENB_P5 && TARGET_POPCNTB) || (e == ENB_P6 && TARGET_CMPB) + || (e == ENB_P6_64 && TARGET_CMPB + && TARGET_POWERPC64) || (e == ENB_ALTIVEC&& TARGET_ALTIVEC) || (e == ENB_CELL && TARGET_ALTIVEC && rs6000_cpu == PROCESSOR_CELL) @@ -16419,6 +16427,8 @@ rs6000_init_builtins (void) continue; if (e == ENB_P6 && !TARGET_CMPB) continue; + if (e == ENB_P6_64 && !(TARGET_CMPB && TARGET_POWERPC64)) + continue; if (e == ENB_ALTIVEC && !TARGET_ALTIVEC) continue; if (e == ENB_VSX && !TARGET_VSX) diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c b/gcc/config/rs6000/rs6000-gen-builtins.c index 1655a2fd765..4ce83bd2290 100644 --- a/gcc/config/rs6000/rs6000-gen-builtins.c +++ b/gcc/config/rs6000/rs6000-gen-builtins.c @@ -212,6 +212,7 @@ enum bif_stanza BSTZ_ALWAYS, BSTZ_P5, BSTZ_P6, + BSTZ_P6_64, BSTZ_ALTIVEC, BSTZ_CELL, BSTZ_VSX, @@ -245,6 +246,7 @@ static stanza_entry stanza_map[NUMBIFSTANZAS] = { "always",BSTZ_ALWAYS }, { "power5",BSTZ_P5 }, { "power6",BSTZ_P6 }, +{ "power6-64", BSTZ_P6_64 }, { "altivec", BSTZ_ALTIVEC}, { "cell", BSTZ_CELL }, { "vsx", BSTZ_VSX}, @@ -269,6 +271,7 @@ static const char *enable_string[NUMBIFSTANZAS] = "ENB_ALWAYS", "ENB_P5", "ENB_P6", +"ENB_P6_64", "ENB_ALTIVEC", "ENB_CELL", "ENB_VSX", @@ -2227,6 +2230,7 @@ write_decls (void) fprintf (header_file, " ENB_ALWAYS,\n"); fprintf (header_file, " ENB_P5,\n"); fprintf (header_file, " ENB_P6,\n"); + fprintf (header_file, " ENB_P6_64,\n"); fprintf (header_file, " ENB_ALT
Re: [PATCH] rs6000: Add [power6-64] stanza to new builtin support
Sorry, I forgot to CC maintainers on this one. Thanks! Bill On 11/16/21 11:06 AM, Bill Schmidt wrote: > Hi! While reviewing the recent 32-bit changes for the new builtin > infrastructure, > I realized that I needed another stanza to represent builtins requiring both > -mcpu=power6 and -mpowerpc64. (There's only one of these, but nonetheless...) > So this patch adds that support in the same fashion as [power7-64] and > [power9-64]. Bootstrapped and tested on powerpc64le-linux-gnu, and on > powerpc64-linux-gnu with -m32/-m64. Is this okay for trunk? > > Thanks! > Bill > > > 2021-11-16 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-builtin-new.def: Add power6-64 stanza. > Move CMPB to power6-64 stanza. > * config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin): Handle > ENB_P6_64 case. > (rs6000_new_builtin_is_supported): Likewise. > (rs6000_expand_new_builtin): Likewise. > (rs6000_init_builtins): Likewise. > * config/rs6000/rs6000-gen-builtins.c (bif_stanza): Add > BSTZ_P6_64. > (stanza_map): Add entry mapping power6-64 to BSTZ_P6_64. > (enable_string): Add "ENB_P6_64". > (write_decls): Add ENB_P6_64 to bif_enable enum. > --- > gcc/config/rs6000/rs6000-builtin-new.def | 9 ++--- > gcc/config/rs6000/rs6000-call.c | 10 ++ > gcc/config/rs6000/rs6000-gen-builtins.c | 4 > 3 files changed, 20 insertions(+), 3 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtin-new.def > b/gcc/config/rs6000/rs6000-builtin-new.def > index 1dd8f6b40b2..58dfce1ca37 100644 > --- a/gcc/config/rs6000/rs6000-builtin-new.def > +++ b/gcc/config/rs6000/rs6000-builtin-new.def > @@ -266,13 +266,16 @@ > > ; Power6 builtins (ISA 2.05). > [power6] > - const signed long __builtin_p6_cmpb (signed long, signed long); > -CMPB cmpbdi3 {} > - >const signed int __builtin_p6_cmpb_32 (signed int, signed int); > CMPB_32 cmpbsi3 {} > > > +; Power6 builtins requiring 64-bit GPRs (even with 32-bit addressing). > +[power6-64] > + const signed long __builtin_p6_cmpb (signed long, signed long); > +CMPB cmpbdi3 {} > + > + > ; AltiVec builtins. > [altivec] >const vsc __builtin_altivec_abs_v16qi (vsc); > diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c > index 83e1abb6118..822a9736591 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -11919,6 +11919,10 @@ rs6000_invalid_new_builtin (enum rs6000_gen_builtins > fncode) > case ENB_P6: >error ("%qs requires the %qs option", name, "-mcpu=power6"); >break; > +case ENB_P6_64: > + error ("%qs requires the %qs option and either the %qs or %qs option", > + name, "-mcpu=power6", "-m64", "-mpowerpc64"); > + break; > case ENB_ALTIVEC: >error ("%qs requires the %qs option", name, "-maltivec"); >break; > @@ -13346,6 +13350,8 @@ rs6000_new_builtin_is_supported (enum > rs6000_gen_builtins fncode) >return TARGET_POPCNTB; > case ENB_P6: >return TARGET_CMPB; > +case ENB_P6_64: > + return TARGET_CMPB && TARGET_POWERPC64; > case ENB_P7: >return TARGET_POPCNTD; > case ENB_P7_64: > @@ -15697,6 +15703,8 @@ rs6000_expand_new_builtin (tree exp, rtx target, >if (!(e == ENB_ALWAYS > || (e == ENB_P5 && TARGET_POPCNTB) > || (e == ENB_P6 && TARGET_CMPB) > + || (e == ENB_P6_64 && TARGET_CMPB > + && TARGET_POWERPC64) > || (e == ENB_ALTIVEC&& TARGET_ALTIVEC) > || (e == ENB_CELL && TARGET_ALTIVEC > && rs6000_cpu == PROCESSOR_CELL) > @@ -16419,6 +16427,8 @@ rs6000_init_builtins (void) > continue; > if (e == ENB_P6 && !TARGET_CMPB) > continue; > + if (e == ENB_P6_64 && !(TARGET_CMPB && TARGET_POWERPC64)) > + continue; > if (e == ENB_ALTIVEC && !TARGET_ALTIVEC) > continue; > if (e == ENB_VSX && !TARGET_VSX) > diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c > b/gcc/config/rs6000/rs6000-gen-builtins.c > index 1655a2fd765..4ce83bd2290 100644 > --- a/gcc/config/rs6000/rs6000-gen-builtins.c > +++ b/gcc/config/rs6000/rs6000-gen-builtins.c > @@ -212,6 +212,7 @@ enum bif_stanza > BSTZ_ALWAYS, > BSTZ_P5, > BSTZ_P6, > + BSTZ_P6_64, > BSTZ_ALTIVEC, > BSTZ_CELL, > BSTZ_VSX, > @@ -245,6 +246,7 @@ static stanza_entry stanza_map[NUMBIFSTANZAS] = > { "always", BSTZ_ALWAYS }, > { "power5", BSTZ_P5 }, > { "power6", BSTZ_P6 }, > +{ "power6-64", BSTZ_P6_64 }, > { "altivec", BSTZ_ALTIVEC}, > { "cell",BSTZ_CELL }, > { "vsx", BSTZ_VSX}, > @@ -269,6 +271,7 @@ static const char *enable_string[NUMBIFSTANZAS] = > "ENB_ALWAYS", > "ENB_P5", > "ENB_P6", > +"ENB_P6_64", >
[PATCH] rs6000: Better error messages for power8/9-vector builtins
Hi! During a previous patch review, Segher asked that I provide better messages when builtins are unavailable because they require both a minimum CPU and the enablement of VSX instructions. This patch does just that. Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this okay for trunk? Thanks! Bill 2021-11-11 Bill Schmidt gcc/ * config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin): Change error messages for ENB_P8V and ENB_P9V. --- gcc/config/rs6000/rs6000-call.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index 85fec80c6d7..035266eb001 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -11943,7 +11943,8 @@ rs6000_invalid_new_builtin (enum rs6000_gen_builtins fncode) error ("%qs requires the %qs option", name, "-mcpu=power8"); break; case ENB_P8V: - error ("%qs requires the %qs option", name, "-mpower8-vector"); + error ("%qs requires the %qs and %qs options", name, "-mcpu=power8", +"-mvsx"); break; case ENB_P9: error ("%qs requires the %qs option", name, "-mcpu=power9"); @@ -11953,7 +11954,8 @@ rs6000_invalid_new_builtin (enum rs6000_gen_builtins fncode) name, "-mcpu=power9", "-m64", "-mpowerpc64"); break; case ENB_P9V: - error ("%qs requires the %qs option", name, "-mpower9-vector"); + error ("%qs requires the %qs and %qs options", name, "-mcpu=power9", +"-mvsx"); break; case ENB_IEEE128_HW: error ("%qs requires ISA 3.0 IEEE 128-bit floating point", name); -- 2.27.0
Re: [musl] Re: [PATCH v2] configure: define TARGET_LIBC_GNUSTACK on musl
On Tue, Nov 16, 2021 at 8:41 AM Rich Felker wrote: > > On Tue, Nov 16, 2021 at 03:40:00PM +0100, Dragan Mladjenovic wrote: > > Hi, > > > > Looks fine to me. If possible, maybe it should even be back-ported > > to stable branches. The change cherry-picks fine onto 10.x and 11.x branches. Should I send out separate patches or can the committer of this patch apply it to 10.x and 11.x? > > > > Not sure if MIPS assembly sources (if any) in musl would need > > explicit ..note.GNU-stack > > > > to complement this? > > What are the actual consequences of making this change, and what is > the goal? I'm concerned that it might produce object files which don't > include annotation that they don't need executable stack, in which > case the final executable file will be marked as executable-stack and > the kernel will load it as such. That would be very bad. It is actually the other way around - for MIPS hard-float targets on non-glibc (or glibc < 2.31) without this change the .note.GNU-stack annotation is not emitted by GCC. Ilya > > Rich > > > > On 16-Nov-21 06:13, Ilya Lipnitskiy wrote: > > >musl only uses PT_GNU_STACK to set default thread stack size and has no > > >executable stack support[0], so there is no reason not to emit the > > >.note.GNU-stack section on musl builds. > > > > > >[0]: > > >https://lore.kernel.org/all/20190423192534.gn23...@brightrain.aerifal.cx/T/#u > > > > > >gcc/ChangeLog: > > > > > > * configure: Regenerate. > > > * configure.ac: define TARGET_LIBC_GNUSTACK on musl > > > > > >Signed-off-by: Ilya Lipnitskiy > > >--- > > > gcc/configure| 3 +++ > > > gcc/configure.ac | 3 +++ > > > 2 files changed, 6 insertions(+) > > > > > >diff --git a/gcc/configure b/gcc/configure > > >index 74b9d9be4c85..7091a838aefa 100755 > > >--- a/gcc/configure > > >+++ b/gcc/configure > > >@@ -31275,6 +31275,9 @@ fi > > > # Check if the target LIBC handles PT_GNU_STACK. > > > gcc_cv_libc_gnustack=unknown > > > case "$target" in > > >+ mips*-*-linux-musl*) > > >+gcc_cv_libc_gnustack=yes > > >+;; > > >mips*-*-linux*) > > > if test $glibc_version_major -gt 2 \ > > >diff --git a/gcc/configure.ac b/gcc/configure.ac > > >index c9ee1fb8919e..8a2d34179a75 100644 > > >--- a/gcc/configure.ac > > >+++ b/gcc/configure.ac > > >@@ -6961,6 +6961,9 @@ fi > > > # Check if the target LIBC handles PT_GNU_STACK. > > > gcc_cv_libc_gnustack=unknown > > > case "$target" in > > >+ mips*-*-linux-musl*) > > >+gcc_cv_libc_gnustack=yes > > >+;; > > >mips*-*-linux*) > > > GCC_GLIBC_VERSION_GTE_IFELSE([2], [31], [gcc_cv_libc_gnustack=yes], ) > > > ;;
Re: [PATCH 4/5] if-conv: Apply VN to hoisted conversions
Richard Biener via Gcc-patches writes: > On Mon, Nov 15, 2021 at 3:00 PM Richard Sandiford > wrote: >> >> Richard Biener via Gcc-patches writes: >> > On Fri, Nov 12, 2021 at 7:05 PM Richard Sandiford via Gcc-patches >> > wrote: >> >> >> >> This patch is a prerequisite for a later one. At the moment, >> >> if-conversion converts predicated POINTER_PLUS_EXPRs into >> >> non-wrapping forms, which for: >> >> >> >> … = base + offset >> >> >> >> becomes: >> >> >> >> tmp = (unsigned long) base >> >> … = tmp + offset >> >> >> >> It then hoists these conversions out of the loop where possible. >> >> >> >> However, because “base” is a valid gimple operand, there can be >> >> multiple POINTER_PLUS_EXPRs with the same base, which can in turn >> >> lead to multiple instances of the same conversion. The later VN pass >> >> is (and I think needs to be) restricted to the new if-converted code, >> >> whereas here we're deliberately inserting the conversions before the >> >> .LOOP_VECTORIZED condition: >> >> >> >> /* If we versioned loop then make sure to insert invariant >> >>stmts before the .LOOP_VECTORIZED check since the vectorizer >> >>will re-use that for things like runtime alias versioning >> >>whose condition can end up using those invariants. */ >> >> >> >> We can therefore enter the vectoriser with redundant conversions. >> >> >> >> The easiest fix seemed to be to defer the hoisting until after VN. >> >> This catches other hoisting opportunities too. >> >> >> >> Hoisting the code from the (artificial) loop in pr99102.c means >> >> that it's no longer worth vectorising. The patch forces vectorisation >> >> instead of relying on the cost model. >> >> >> >> The patch also reverts pr87007-4.c and pr87007-5.c back to their >> >> original forms, undoing changes in 783dc66f9ccb0019c3dad. >> >> The code at the time the tests were added was: >> >> >> >> testl %edi, %edi >> >> je .L10 >> >> vxorps %xmm1, %xmm1, %xmm1 >> >> vsqrtsd d3(%rip), %xmm1, %xmm0 >> >> vsqrtsd d2(%rip), %xmm1, %xmm1 >> >> ... >> >> .L10: >> >> ret >> >> >> >> with the operations being hoisted, and the vxorps was specifically >> >> wanted (compared to the previous code). This patch restores the code >> >> to that form, with the hoisted operations and the vxorps. >> >> >> >> Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> >> >> Richard >> >> >> >> >> >> gcc/ >> >> * tree-if-conv.c: Include tree-eh.h. >> >> (predicate_statements): Remove pe argument. Don't hoist >> >> statements here. >> >> (combine_blocks): Remove pe argument. >> >> (ifcvt_can_hoist, ifcvt_can_hoist_further): New functions. >> >> (ifcvt_hoist_invariants): Likewise. >> >> (tree_if_conversion): Update call to combine_blocks. Call >> >> ifcvt_hoist_invariants after VN. >> >> >> >> gcc/testsuite/ >> >> * gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model. >> >> >> >> Revert: >> >> >> >> 2020-09-09 Richard Biener >> >> >> >> * gcc.target/i386/pr87007-4.c: Adjust. >> >> * gcc.target/i386/pr87007-5.c: Likewise. >> >> --- >> >> gcc/testsuite/gcc.dg/vect/pr99102.c | 2 +- >> >> gcc/testsuite/gcc.target/i386/pr87007-4.c | 2 +- >> >> gcc/testsuite/gcc.target/i386/pr87007-5.c | 2 +- >> >> gcc/tree-if-conv.c| 122 -- >> >> 4 files changed, 114 insertions(+), 14 deletions(-) >> >> >> >> diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c >> >> b/gcc/testsuite/gcc.dg/vect/pr99102.c >> >> index 6c1a13f0783..0d030d15c86 100644 >> >> --- a/gcc/testsuite/gcc.dg/vect/pr99102.c >> >> +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c >> >> @@ -1,4 +1,4 @@ >> >> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ >> >> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model >> >> -fdump-tree-vect-details" } */ >> >> /* { dg-additional-options "-msve-vector-bits=256" { target >> >> aarch64_sve256_hw } } */ >> >> long a[44]; >> >> short d, e = -7; >> >> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c >> >> b/gcc/testsuite/gcc.target/i386/pr87007-4.c >> >> index 9c4b8005af3..e91bdcbac44 100644 >> >> --- a/gcc/testsuite/gcc.target/i386/pr87007-4.c >> >> +++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c >> >> @@ -15,4 +15,4 @@ foo (int n, int k) >> >>d1 = ceil (d3); >> >> } >> >> >> >> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } >> >> */ >> >> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } >> >> */ >> >> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c >> >> b/gcc/testsuite/gcc.target/i386/pr87007-5.c >> >> index e4d956a5d7f..20d13cf650b 100644 >> >> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c >> >> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c >> >> @@ -15,4 +15,4 @@ foo (int n, int k) >> >>
Re: [PATCH] rs6000: MMA test case emits wrong code when building a vector pair
On 11/13/21 7:25 AM, Segher Boessenkool wrote: > On Wed, Oct 27, 2021 at 08:37:57PM -0500, Peter Bergner wrote: >> PR102976 shows a test case where we generate wrong code when building >> a vector pair from 2 vector registers. The bug here is that with unlucky >> register assignments, we can clobber one of the input operands before >> we write both registers of the output operand. The solution is to use >> early-clobbers in the assemble pair and accumulator patterns. > > Because of what insns there are after the split. Aha. > > Please add a comment explaining this, near the earlyclobber itself. Done for both patterns. > You can just write this as {\mxxlor \d+,44,44\M} etc., that will be > simplest I think. Done and tested that it still works. > Okay for trunk with comments added near the earlyclobber, and the RE > improved. Also fine for 11 after some burn-in. Thanks! Ok, I pushed with both changes. I'll push a change to GCC11 in a few days. Thanks! Peter
[PATCH] x86: Add -mharden-sls=[none|all|return|indirect-branch]
Add -mharden-sls= to mitigate against straight line speculation (SLS) for function return and indirect branch by adding an INT3 instruction after function return and indirect branch. gcc/ PR target/102952 * config/i386/i386-opts.h (harden_sls): New enum. * config/i386/i386.c (output_indirect_thunk): Mitigate against SLS for function return. (ix86_output_function_return): Likewise. (ix86_output_jmp_thunk_or_indirect): Mitigate against indirect branch. (ix86_output_indirect_jmp): Likewise. (ix86_output_call_insn): Likewise. * config/i386/i386.opt: Add -mharden-sls=. * doc/invoke.texi: Document -mharden-sls=. gcc/testsuite/ PR target/102952 * gcc.target/i386/harden-sls-1.c: New test. * gcc.target/i386/harden-sls-2.c: Likewise. * gcc.target/i386/harden-sls-3.c: Likewise. * gcc.target/i386/harden-sls-4.c: Likewise. --- gcc/config/i386/i386-opts.h | 7 + gcc/config/i386/i386.c | 30 gcc/config/i386/i386.opt | 20 + gcc/doc/invoke.texi | 10 ++- gcc/testsuite/gcc.target/i386/harden-sls-1.c | 14 + gcc/testsuite/gcc.target/i386/harden-sls-2.c | 14 + gcc/testsuite/gcc.target/i386/harden-sls-3.c | 14 + gcc/testsuite/gcc.target/i386/harden-sls-4.c | 14 + 8 files changed, 116 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-1.c create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-2.c create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-3.c create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-4.c diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h index 04e4ad608fb..171d3106d0a 100644 --- a/gcc/config/i386/i386-opts.h +++ b/gcc/config/i386/i386-opts.h @@ -121,4 +121,11 @@ enum instrument_return { instrument_return_nop5 }; +enum harden_sls { + harden_sls_none = 0, + harden_sls_return = 1 << 0, + harden_sls_indirect_branch = 1 << 1, + harden_sls_all = harden_sls_return | harden_sls_indirect_branch +}; + #endif diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index cc9f9322fad..0a902d66321 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5914,6 +5914,8 @@ output_indirect_thunk (unsigned int regno) } fputs ("\tret\n", asm_out_file); + if ((ix86_harden_sls & harden_sls_return)) +fputs ("\tint3\n", asm_out_file); } /* Output a funtion with a call and return thunk for indirect branch. @@ -15987,6 +15989,8 @@ ix86_output_jmp_thunk_or_indirect (const char *thunk_name, const int regno) fprintf (asm_out_file, "\tjmp\t"); assemble_name (asm_out_file, thunk_name); putc ('\n', asm_out_file); + if ((ix86_harden_sls & harden_sls_indirect_branch)) + fputs ("\tint3\n", asm_out_file); } else output_indirect_thunk (regno); @@ -16212,10 +16216,14 @@ ix86_output_indirect_jmp (rtx call_op) gcc_unreachable (); ix86_output_indirect_branch (call_op, "%0", true); - return ""; + if ((ix86_harden_sls & harden_sls_indirect_branch)) + return "int3"; + else + return ""; } else -return "%!jmp\t%A0"; +return ((ix86_harden_sls & harden_sls_indirect_branch) + ? "%!jmp\t%A0\n\tint3" : "%!jmp\t%A0"); } /* Output return instrumentation for current function if needed. */ @@ -16283,10 +16291,15 @@ ix86_output_function_return (bool long_p) return ""; } - if (!long_p) -return "%!ret"; + if ((ix86_harden_sls & harden_sls_return)) +return "%!ret\n\tint3"; + else +{ + if (!long_p) + return "%!ret"; - return "rep%; ret"; + return "rep%; ret"; +} } /* Output indirect function return. RET_OP is the function return @@ -16381,7 +16394,12 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op) if (output_indirect_p && !direct_p) ix86_output_indirect_branch (call_op, xasm, true); else - output_asm_insn (xasm, &call_op); + { + output_asm_insn (xasm, &call_op); + if (!direct_p + && (ix86_harden_sls & harden_sls_indirect_branch)) + return "int3"; + } return ""; } diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index b38ac13fc91..c5452c49597 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1121,6 +1121,26 @@ mrecord-return Target Var(ix86_flag_record_return) Init(0) Generate a __return_loc section pointing to all return instrumentation code. +mharden-sls= +Target RejectNegative Joined Enum(harden_sls) Var(ix86_harden_sls) Init(harden_sls_none) +Generate code to mitigate against straight line speculation. + +Enum +Name(harden_sls) Type(enum harden_sls) +Known choices for mitigation against straight line speculation with -mharden-sls
[PATCH] x86: Add -mindirect-branch-cs-prefix
Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk via r8-r15 registers when converting indirect call and jump to increase the instruction length to 6, allowing the non-thunk form to be inlined. gcc/ PR target/102952 * config/i386/i386.c (ix86_output_jmp_thunk_or_indirect): Emit CS prefix for -mindirect-branch-cs-prefix. (ix86_output_indirect_branch_via_reg): Likewise. * config/i386/i386.opt: Add -mindirect-branch-cs-prefix. * doc/invoke.texi: Document -mindirect-branch-cs-prefix. gcc/testsuite/ PR target/102952 * gcc.target/i386/indirect-thunk-cs-prefix-1.c: New test. * gcc.target/i386/indirect-thunk-cs-prefix-2.c: Likewise. --- gcc/config/i386/i386.c| 6 ++ gcc/config/i386/i386.opt | 4 gcc/doc/invoke.texi | 8 +++- .../gcc.target/i386/indirect-thunk-cs-prefix-1.c | 14 ++ .../gcc.target/i386/indirect-thunk-cs-prefix-2.c | 15 +++ 5 files changed, 46 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-2.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7e9b7bc347f..0a902d66321 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -15983,6 +15983,9 @@ ix86_output_jmp_thunk_or_indirect (const char *thunk_name, const int regno) { if (thunk_name != NULL) { + if (regno >= FIRST_REX_INT_REG + && ix86_indirect_branch_cs_prefix) + fprintf (asm_out_file, "\tcs\n"); fprintf (asm_out_file, "\tjmp\t"); assemble_name (asm_out_file, thunk_name); putc ('\n', asm_out_file); @@ -16036,6 +16039,9 @@ ix86_output_indirect_branch_via_reg (rtx call_op, bool sibcall_p) { if (thunk_name != NULL) { + if (regno >= FIRST_REX_INT_REG + && ix86_indirect_branch_cs_prefix) + fprintf (asm_out_file, "\tcs\n"); fprintf (asm_out_file, "\tcall\t"); assemble_name (asm_out_file, thunk_name); putc ('\n', asm_out_file); diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 8d499a5a4df..c5452c49597 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1076,6 +1076,10 @@ Enum(indirect_branch) String(thunk-inline) Value(indirect_branch_thunk_inline) EnumValue Enum(indirect_branch) String(thunk-extern) Value(indirect_branch_thunk_extern) +mindirect-branch-cs-prefix +Target Var(ix86_indirect_branch_cs_prefix) Init(0) +Add CS prefix to call and jmp to thunk when converting indirect call and jump. + mindirect-branch-register Target Var(ix86_indirect_branch_register) Init(0) Force indirect call and jump via register. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index f3b4b467765..c992a7152f5 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1425,7 +1425,8 @@ See RS/6000 and PowerPC Options. -mstack-protector-guard-symbol=@var{symbol} @gol -mgeneral-regs-only -mcall-ms2sysv-xlogues -mrelax-cmpxchg-loop @gol -mindirect-branch=@var{choice} -mfunction-return=@var{choice} @gol --mindirect-branch-register -mharden-sls=@var{choice} -mneeded} +-mindirect-branch-register -mharden-sls=@var{choice} @gol +-mindirect-branch-cs-prefix -mneeded} @emph{x86 Windows Options} @gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol @@ -32390,6 +32391,11 @@ hardening. @samp{return} enables SLS hardening for function return. @samp{indirect-branch} enables SLS hardening for indirect branch. @samp{all} enables all SLS hardening. +@item -mindirect-branch-cs-prefix +@opindex mindirect-branch-cs-prefix +Add CS prefix to call and jmp to thunk via r8-r15 registers when +converting indirect call and jump. + @end table These @samp{-m} switches are supported in addition to the above diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c new file mode 100644 index 000..db2f3416823 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -ffixed-rax -ffixed-rbx -ffixed-rcx -ffixed-rdx -ffixed-rdi -ffixed-rsi -mindirect-branch-cs-prefix -mindirect-branch=thunk-extern" } */ +/* { dg-additional-options "-fno-pic" { target { ! *-*-darwin* } } } */ + +extern void (*fptr) (void); + +void +foo (void) +{ + fptr (); +} + +/* { dg-final { scan-assembler-times "jmp\[ \t\]+_?__x86_indirect_thunk_r\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\tcs" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-2.c new file mode 100644 index 000..adfc39a49d4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/indir
[PATCH v2] rs6000: Fix a handful of 32-bit built-in function problems
Hi! I previously posted [1] to correct some problems with the new builtins support targeting 32-bit code gen. Based on the discussion, I've made some adjustments and would like to submit this for consideration. We eventually agreed that the strange behavior for -m32 -mpowerpc64 for certain HTM builtins should be removed. All of the registers TEXASR, TEXASRU, TFHAR, and TFIAR are now accessed using the unsigned long data type in all configurations. Segher didn't like the change in the error message for the cmpb-3.c test case, but I think this should be fine. The test case just tests for the error message, but there is also a "note" message that provides additional information. The diagnostics that the user sees will look like this: cmpb-3.c:11:3: error: '__builtin_p6_cmpb' requires the '-mcpu=power6' option and either the '-m64' or '-mpowerpc64' option cmpb-3.c:11:3: note: builtin '__builtin_cmpb' requires builtin '__builtin_p6_cmpb' So it's clear to the user that their use of __builtin_cmpb at line 11 triggered the error. Bootstrapped and tested on powerpc64le-linux-gnu, and on powerpc64-linux-gnu using -m32/-m64. Is this okay for trunk? Thanks! Bill [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583905.html 2021-11-16 Bill Schmidt gcc/ * config/rs6000/rs6000-builtin-new.def (CMPB): Flag as no32bit. (BPERMD): Flag as 32bit (needing special handling for 32-bit). (UNPACK_TD): Return unsigned long long instead of unsigned long. (GET_TEXASR): Return unsigned long instead of unsigned long long. (GET_TEXASRU): Likewise. (GET_TFHAR): Likewise. (GET_TFIAR): Likewise. (SET_TEXASR): Pass unsigned long instead of unsigned long long. (SET_TEXASRU): Likewise. (SET_TFHAR): Likewise. (SET_TFIAR): Likewise. (TABORTDC): Likewise. (TABORTDCI): Likewise. * config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): Fix error handling for no32bit. Add 32bit handling for RS6000_BIF_BPERMD. gcc/testsuite/ * gcc.target/powerpc/cmpb-3.c: Adjust error message. --- gcc/config/rs6000/rs6000-builtin-new.def | 30 +++ gcc/config/rs6000/rs6000-call.c | 9 --- gcc/testsuite/gcc.target/powerpc/cmpb-3.c | 2 +- 3 files changed, 22 insertions(+), 19 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtin-new.def b/gcc/config/rs6000/rs6000-builtin-new.def index 58dfce1ca37..30556e5c7f2 100644 --- a/gcc/config/rs6000/rs6000-builtin-new.def +++ b/gcc/config/rs6000/rs6000-builtin-new.def @@ -273,7 +273,7 @@ ; Power6 builtins requiring 64-bit GPRs (even with 32-bit addressing). [power6-64] const signed long __builtin_p6_cmpb (signed long, signed long); -CMPB cmpbdi3 {} +CMPB cmpbdi3 {no32bit} ; AltiVec builtins. @@ -2018,7 +2018,7 @@ ADDG6S addg6s {} const signed long __builtin_bpermd (signed long, signed long); -BPERMD bpermd_di {} +BPERMD bpermd_di {32bit} const unsigned int __builtin_cbcdtd (unsigned int); CBCDTD cbcdtd {} @@ -2971,7 +2971,7 @@ void __builtin_set_fpscr_drn (const int[0,7]); SET_FPSCR_DRN rs6000_set_fpscr_drn {} - const unsigned long __builtin_unpack_dec128 (_Decimal128, const int<1>); + const unsigned long long __builtin_unpack_dec128 (_Decimal128, const int<1>); UNPACK_TD unpacktd {} @@ -3014,39 +3014,39 @@ [htm] - unsigned long long __builtin_get_texasr (); + unsigned long __builtin_get_texasr (); GET_TEXASR nothing {htm,htmspr} - unsigned long long __builtin_get_texasru (); + unsigned long __builtin_get_texasru (); GET_TEXASRU nothing {htm,htmspr} - unsigned long long __builtin_get_tfhar (); + unsigned long __builtin_get_tfhar (); GET_TFHAR nothing {htm,htmspr} - unsigned long long __builtin_get_tfiar (); + unsigned long __builtin_get_tfiar (); GET_TFIAR nothing {htm,htmspr} - void __builtin_set_texasr (unsigned long long); + void __builtin_set_texasr (unsigned long); SET_TEXASR nothing {htm,htmspr} - void __builtin_set_texasru (unsigned long long); + void __builtin_set_texasru (unsigned long); SET_TEXASRU nothing {htm,htmspr} - void __builtin_set_tfhar (unsigned long long); + void __builtin_set_tfhar (unsigned long); SET_TFHAR nothing {htm,htmspr} - void __builtin_set_tfiar (unsigned long long); + void __builtin_set_tfiar (unsigned long); SET_TFIAR nothing {htm,htmspr} unsigned int __builtin_tabort (unsigned int); TABORT tabort {htm,htmcr} - unsigned int __builtin_tabortdc (unsigned long long, unsigned long long, \ - unsigned long long); + unsigned int __builtin_tabortdc (unsigned long, unsigned long, \ + unsigned long); TABORTDC tabortdc {htm,htmcr} - unsigned int __builtin_tabortdci (unsigned long long, unsigned long long, \ -unsigned long long); + unsi
[r12-5292 Regression] FAIL: gcc.dg/tree-ssa/modref-dse-5.c scan-tree-dump dse2 "Deleted dead store: wrap" on Linux/x86_64
On Linux/x86_64, e69b7c5779863469479698f863ab25e0d9b4586e is the first bad commit commit e69b7c5779863469479698f863ab25e0d9b4586e Author: Jan Hubicka Date: Tue Nov 16 09:15:39 2021 +0100 Fix uninitialized access in merge_call_side_effects caused FAIL: gcc.dg/tree-ssa/modref-dse-5.c scan-tree-dump dse2 "Deleted dead store: wrap" with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5292/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/modref-dse-5.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/modref-dse-5.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/modref-dse-5.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/modref-dse-5.c --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
[r12-5301 Regression] FAIL: gcc.dg/tree-ssa/if-to-switch-3.c scan-tree-dump iftoswitch "Condition chain with [^\n\r]* BBs transformed into a switch statement." on Linux/x86_64
On Linux/x86_64, 045206450386bcd774db3bde0c696828402361c6 is the first bad commit commit 045206450386bcd774db3bde0c696828402361c6 Author: Richard Biener Date: Fri Nov 12 10:21:22 2021 +0100 tree-optimization/102880 - improve CD-DCE caused FAIL: gcc.dg/tree-ssa/if-to-switch-3.c scan-tree-dump iftoswitch "Condition chain with [^\n\r]* BBs transformed into a switch statement." with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5301/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/if-to-switch-3.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/if-to-switch-3.c --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
[pushed] configure, Darwin: Set appropriate defaults for host-shared.
Darwin x86_64 and aarch64 platforms are PIC (shared) by default, and user-space code must be built in this mode. The patch ensures that this is set correctly and applies a default when --enable-host-shared is not set. tested on *-darwin*, x86_64,powerpc64le-linux-gnu, pushed to master, thanks Iain Signed-off-by: Iain Sandoe ChangeLog: * configure: Regenerate. * configure.ac: Ensure that PIC (shared) defaults are set correctly for Darwin. --- configure| 16 +++- configure.ac | 15 ++- 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/configure b/configure index 58979d6e3b1..3062495da31 100755 --- a/configure +++ b/configure @@ -8447,8 +8447,20 @@ fi # Check whether --enable-host-shared was given. if test "${enable_host_shared+set}" = set; then : enableval=$enable_host_shared; host_shared=$enableval + case $target in + x86_64-*-darwin* | aarch64-*-darwin*) + if test x$host_shared != xyes ; then + # PIC is the default, and actually cannot be switched off. + echo configure.ac: warning: PIC code is required for the configured target, host-shared setting ignored. 1>&2 + host_shared=yes + fi ;; + *) ;; + esac else - host_shared=no + case $target in + x86_64-*-darwin* | aarch64-*-darwin*) host_shared=yes ;; + *) host_shared=no ;; + esac fi @@ -10083,6 +10095,8 @@ done + + # Generate default definitions for YACC, M4, LEX and other programs that run # on the build machine. These are used if the Makefile can't locate these # programs in objdir. diff --git a/configure.ac b/configure.ac index 550e6993b59..bed60bcaf72 100644 --- a/configure.ac +++ b/configure.ac @@ -1859,7 +1859,20 @@ AC_SUBST(extra_linker_plugin_flags) AC_ARG_ENABLE(host-shared, [AS_HELP_STRING([--enable-host-shared], [build host code as shared libraries])], -[host_shared=$enableval], [host_shared=no]) +[host_shared=$enableval + case $target in + x86_64-*-darwin* | aarch64-*-darwin*) + if test x$host_shared != xyes ; then + # PIC is the default, and actually cannot be switched off. + echo configure.ac: warning: PIC code is required for the configured target, host-shared setting ignored. 1>&2 + host_shared=yes + fi ;; + *) ;; + esac], +[case $target in + x86_64-*-darwin* | aarch64-*-darwin*) host_shared=yes ;; + *) host_shared=no ;; + esac]) AC_SUBST(host_shared) # By default, C and C++ are the only stage 1 languages. -- 2.24.3 (Apple Git-128)
[PATCH v2] libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026]
On Mon, Nov 15, 2021 at 06:15:40PM -0500, David Malcolm wrote: > > On Mon, Nov 08, 2021 at 04:33:43PM -0500, Marek Polacek wrote: > > > Ping, can we conclude on the name? IMHO, -Wbidirectional is just fine, > > > but changing the name is a trivial operation. > > > > Here's a patch with a better name (suggested by Jonathan W.). Otherwise no > > changes. > > Thanks for implementing this. > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > > > -- >8 -- > > From a link below: > > "An issue was discovered in the Bidirectional Algorithm in the Unicode > > Specification through 14.0. It permits the visual reordering of > > characters via control sequences, which can be used to craft source code > > that renders different logic than the logical ordering of tokens > > ingested by compilers and interpreters. Adversaries can leverage this to > > encode source code for compilers accepting Unicode such that targeted > > vulnerabilities are introduced invisibly to human reviewers." > > > > More info: > > https://nvd.nist.gov/vuln/detail/CVE-2021-42574 > > https://trojansource.codes/ > > > > This is not a compiler bug. However, to mitigate the problem, this patch > > implements -Wbidi-chars=[none|unpaired|any] to warn about possibly > > misleading Unicode bidirectional characters the preprocessor may encounter. > > > > The default is =unpaired, which warns about improperly terminated > > bidirectional characters; e.g. a LRE without its appertaining PDF. The > > I like the default. Great. > Wording nit: maybe use "corresponding" rather than "appertaining"; I > believe the latter has a sense that one is part of the other, when they > are more like peers. OK, fixed. > > level =any warns about any use of bidirectional characters. > > Terminology nit: > The patch is referring to "bidirectional characters", but I think the > term "bidirectional control characters" would be better. Adjusted. > For example, a passage of text containing both numbers and characters > in a right-to-left script could be considered "bidirectional", since > the numbers are written from left-to-right. > > Specifically, the patch looks for these specific characters: > * U+202A LEFT-TO-RIGHT EMBEDDING > * U+202B RIGHT-TO-LEFT EMBEDDING > * U+202C POP DIRECTIONAL FORMATTING > * U+202D LEFT-TO-RIGHT OVERRIDE > * U+202E RIGHT-TO-LEFT OVERRIDE > * U+2066 LEFT-TO-RIGHT ISOLATE > * U+2067 RIGHT-TO-LEFT ISOLATE > * U+2068 FIRST STRONG ISOLATE > * U+2069 POP DIRECTIONAL ISOLATE > > However, the following characters could also be considered as > "bidirectional control characters": > * U+200E LEFT-TO-RIGHT MARK (UTF-8: E2 80 8E) > * U+200F RIGHT-TO-LEFT MARK (UTF-8: E2 80 8F) > but aren't checked for in the patch. Should they be? I can imagine > ways in which they could be abused, so I think so. I'd only intended to check the bidi chars described in the original trojan source pdf, but I added checking for U+200E/U+200F too, since it was easy enough. AFAIK they aren't popped by a PDF/PDI like the rest, so don't need to go on the vec, and so we only warn with =any. Tests: Wbidi-chars-16.c + Wbidi-chars-17.c > [...snip...] > > > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > > index 06457ac739e..b047df0f125 100644 > > --- a/gcc/c-family/c.opt > > +++ b/gcc/c-family/c.opt > > @@ -374,6 +374,30 @@ Wbad-function-cast > > C ObjC Var(warn_bad_function_cast) Warning > > Warn about casting functions to incompatible types. > > > > +Wbidi-chars > > +C ObjC C++ ObjC++ Warning Alias(Wbidi-chars=,any,none) > > +; > > + > > +Wbidi-chars= > > +C ObjC C++ ObjC++ RejectNegative Joined Warning > > CPP(cpp_warn_bidirectional) CppReason(CPP_W_BIDIRECTIONAL) > > Var(warn_bidirectional) Init(bidirectional_unpaired) > > Enum(cpp_bidirectional_level) > > +-Wbidi-chars=[none|unpaired|any] Warn about UTF-8 bidirectional characters. > > "control characters" Fixed. > [...snip...] > > > > > +@item -Wbidi-chars=@r{[}none@r{|}unpaired@r{|}any@r{]} > > +@opindex Wbidi-chars= > > +@opindex Wbidi-chars > > +@opindex Wno-bidi-chars > > +Warn about possibly misleading UTF-8 bidirectional characters in comments, > > (and here again) Fixed. > > +string literals, character constants, and identifiers. Such characters can > > +change left-to-right writing direction into right-to-left (and vice versa), > > +which can cause confusion between the logical order and visual order. This > > +may be dangerous; for instance, it may seem that a piece of code is not > > +commented out, whereas it in fact is. > > + > > +There are three levels of warning supported by GCC@. The default is > > +@option{-Wbidi-chars=unpaired}, which warns about improperly terminated > > +bidi contexts. @option{-Wbidi-chars=none} turns the warning off. > > +@option{-Wbidi-chars=any} warns about any use of bidirectional characters. > > (and again) Fixed. > [...snip...] > > > > diff --git a/gcc/testsuite/c-c++-common/Wbidi-chars
Re: [PATCH v4] Fix ICE when mixing VLAs and statement expressions [PR91038]
On 11/16/21 08:48, Uecker, Martin wrote: Am Montag, den 08.11.2021, 19:13 +0100 schrieb Martin Uecker: Am Montag, den 08.11.2021, 12:13 -0500 schrieb Jason Merrill: On 11/7/21 01:40, Uecker, Martin wrote: Am Mittwoch, den 03.11.2021, 10:18 -0400 schrieb Jason Merrill: ... Thank you! I made these changes and ran bootstrap and tests again. Hmm, it doesn't look like you made the change to use the save_expr function instead of build1? Oh, sorry. I wanted to change it and then forgot. Now also with this change (changelog as before). Ok, with is this change? OK. Best, Martin Ok for trunk? Any idea how to fix returning structs with VLA member from statement expressions? Testcase? void foo(void) { ({ int N = 3; struct { char x[N]; } x; x; }); } The difference to the tests in this patch (which also forgot to include in the last version) is that the object of variable size is returned from the statement expression and not a pointer to it. This can not happen with arrays because they decay to pointers. Martin Otherwise, I will add an error message to the FE in another patch. Martin diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 436df45df68..95083f95442 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -3306,7 +3306,19 @@ pointer_int_sum (location_t loc, enum tree_code resultcode, TREE_TYPE (result_type))) size_exp = integer_one_node; else -size_exp = size_in_bytes_loc (loc, TREE_TYPE (result_type)); +{ + size_exp = size_in_bytes_loc (loc, TREE_TYPE (result_type)); + /* Wrap the pointer expression in a SAVE_EXPR to make sure it +is evaluated first when the size expression may depend +on it for VM types. */ + if (TREE_SIDE_EFFECTS (size_exp) + && TREE_SIDE_EFFECTS (ptrop) + && variably_modified_type_p (TREE_TYPE (ptrop), NULL)) + { + ptrop = save_expr (ptrop); + size_exp = build2 (COMPOUND_EXPR, TREE_TYPE (intop), ptrop, size_exp); + } +} /* We are manipulating pointer values, so we don't need to warn about relying on undefined signed overflow. We disable the diff --git a/gcc/gimplify.c b/gcc/gimplify.c index c2ab96e7e18..84f7dc3c248 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -2964,7 +2964,9 @@ gimplify_var_or_parm_decl (tree *expr_p) declaration, for which we've already issued an error. It would be really nice if the front end wouldn't leak these at all. Currently the only known culprit is C++ destructors, as seen - in g++.old-deja/g++.jason/binding.C. */ + in g++.old-deja/g++.jason/binding.C. + Another possible culpit are size expressions for variably modified + types which are lost in the FE or not gimplified correctly. */ if (VAR_P (decl) && !DECL_SEEN_IN_BIND_EXPR_P (decl) && !TREE_STATIC (decl) && !DECL_EXTERNAL (decl) @@ -3109,16 +3111,22 @@ gimplify_compound_lval (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, expression until we deal with any variable bounds, sizes, or positions in order to deal with PLACEHOLDER_EXPRs. - So we do this in three steps. First we deal with the annotations - for any variables in the components, then we gimplify the base, - then we gimplify any indices, from left to right. */ + The base expression may contain a statement expression that + has declarations used in size expressions, so has to be + gimplified before gimplifying the size expressions. + + So we do this in three steps. First we deal with variable + bounds, sizes, and positions, then we gimplify the base, + then we deal with the annotations for any variables in the + components and any indices, from left to right. */ + for (i = expr_stack.length () - 1; i >= 0; i--) { tree t = expr_stack[i]; if (TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF) { - /* Gimplify the low bound and element type size and put them into + /* Deal with the low bound and element type size and put them into the ARRAY_REF. If these values are set, they have already been gimplified. */ if (TREE_OPERAND (t, 2) == NULL_TREE) @@ -3127,18 +3135,8 @@ gimplify_compound_lval (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, if (!is_gimple_min_invariant (low)) { TREE_OPERAND (t, 2) = low; - tret = gimplify_expr (&TREE_OPERAND (t, 2), pre_p, - post_p, is_gimple_reg, - fb_rvalue); - ret = MIN (ret, tret); } } - else - { - tret = gimplify_expr (&TREE_OPERAND (t, 2), pre_p, post_p, - is_gimple_reg, fb_rvalue); - ret = MIN (ret, tre
Re: [PATCH] Fix spelling of ones' complement.
On Tue, 16 Nov 2021 15:55:55 +0100 Aldy Hernandez via Gcc-patches wrote: > All sources before Knuth are clearly wrong. How could they not? > Folks living in the pre-Knuth era lived without a deity. > > :-P Not sure if this one's a compliment. Speaking of which: $ git grep -i "complim" gcc/ChangeLog-2000: addition over compliments over shifts. gcc/ada/sem_util.adb: -- Assume that the main unit does not have a complimentary unit gcc/ada/sem_util.adb: -- Obtain the complimentary unit of the main unit gcc/config/fr30/fr30.c: /* Convert GCC's comparison operators into the complimentary FR30 gcc/config/mn10300/mn10300.md: /* Recall that twos-compliment is ones-compliment plus one. When gcc/config/nds32/constraints.md: "A constant whose compliment value is in the range of imm15u gcc/config/nds32/nds32.md:;; 'ONE_COMPLIMENT' operation gcc/config/sparc/sparc.h: compliment of ordered and unordered comparisons, but until generic gcc/config/visium/visium.h: compliment of ordered and unordered comparisons, but until generic gcc/d/expr.cc: /* Build a compliment expression, where all the bits in the value are gcc/d/intrinsics.cc: Variants of `bt' will then update that bit. `btc' compliments the bit, `bts' gcc/doc/md.texi:A constant whose compliment value is in the range of imm15u gcc/ipa-reference.c: /* Create the complimentary sets. */ libstdc++-v3/testsuite/data/thirty_years_among_the_dead_preproc.txt:compliment Maybe someone competent should contemplate to complement the fixes for ones' two's complement in the above, except the first and last... ;)
[PATCH, committed] PR fortran/103286 - ICE in resolve_select, at fortran/resolve.c:8848
Committed to mainline as obvious after regtesting. When issuing an error on an invalid range in a SELECT CASE statement with a logical case expression, we need to be careful to use the right locus information. Thanks, Harald From 3b3c9932338650c9a402cf1bfbdf7dfc03e185e7 Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Tue, 16 Nov 2021 21:06:06 +0100 Subject: [PATCH] Fortran: avoid NULL pointer dereference on invalid range in logical SELECT CASE gcc/fortran/ChangeLog: PR fortran/103286 * resolve.c (resolve_select): Choose appropriate range limit to avoid NULL pointer dereference when generating error message. gcc/testsuite/ChangeLog: PR fortran/103286 * gfortran.dg/pr103286.f90: New test. --- gcc/fortran/resolve.c | 3 ++- gcc/testsuite/gfortran.dg/pr103286.f90 | 11 +++ 2 files changed, 13 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gfortran.dg/pr103286.f90 diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index 705d2326a29..f074a0ab3a1 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -8846,7 +8846,8 @@ resolve_select (gfc_code *code, bool select_type) || cp->low != cp->high)) { gfc_error ("Logical range in CASE statement at %L is not " - "allowed", &cp->low->where); + "allowed", + cp->low ? &cp->low->where : &cp->high->where); t = false; break; } diff --git a/gcc/testsuite/gfortran.dg/pr103286.f90 b/gcc/testsuite/gfortran.dg/pr103286.f90 new file mode 100644 index 000..1c18b7136ce --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr103286.f90 @@ -0,0 +1,11 @@ +! { dg-do compile } +! { dg-options "std=gnu" } +! PR fortran/103286 - ICE in resolve_select + +program p + select case (.true.) ! { dg-warning "Extension: Conversion" } + case (1_8) + case (:0)! { dg-error "Logical range in CASE statement" } + case (2:)! { dg-error "Logical range in CASE statement" } + end select +end -- 2.26.2
Re: [PATCH] restore ancient -Waddress for weak symbols [PR33925]
On 10/23/21 19:06, Martin Sebor wrote: On 10/4/21 3:37 PM, Jason Merrill wrote: On 10/4/21 14:42, Martin Sebor wrote: While resolving the recent -Waddress enhancement request (PR PR102103) I came across a 2007 problem report about GCC 4 having stopped warning for using the address of inline functions in equality comparisons with null. With inline functions being commonplace in C++ this seems like an important use case for the warning. The change that resulted in suppressing the warning in these cases was introduced inadvertently in a fix for PR 22252. To restore the warning, the attached patch enhances the decl_with_nonnull_addr_p() function to return true also for weak symbols for which a definition has been provided. I think you probably want to merge this function with fold-const.c:maybe_nonzero_address, which already handles more cases. maybe_nonzero_address() doesn't behave quite like decl_with_nonnull_addr_p() expects and I'm reluctant to muck around with the former too much since it's used for codegen, while the latter just for warnings. (There is even a case where the functions don't behave the same, and would result in different warnings between C and C++ without some extra help.) So in the attached revision I just have maybe_nonzero_address() call decl_with_nonnull_addr_p() and then refine the failing (or uncertain) cases separately, with some overlap between them. Since I worked on this someone complained that some instances of the warning newly enhanced under PR102103 aren't suppresed in code resulting from macro expansion. Since it's trivial, I include the fix for that report in this patch as well. + allocated stroage might have a null address. */ typo. OK with that fixed. Jason
Re: [RFC] c++: Print function template parms when relevant (was: [PATCH v4] c++: Add gnu::diagnose_as attribute)
On 11/8/21 15:00, Matthias Kretz wrote: I forgot to mention why I tagged it [RFC]: I needed one more bit of information on the template args TREE_VEC to encode EXPLICIT_TEMPLATE_ARGS_P. Its TREE_CHAIN already points to an integer constant denoting the number of non-default arguments, so I couldn't trivially replace that. Therefore, I used the sign of that integer. I was hoping to find a cleaner solution, though. It seems that we aren't using any TREE_LANG_FLAG_n on TREE_VEC, so that would be a cleaner solution. On Monday, 8 November 2021 17:40:44 CET Matthias Kretz wrote: On Tuesday, 17 August 2021 20:31:54 CET Jason Merrill wrote: 2. Given a DECL_TI_ARGS tree, can I query whether an argument was deduced or explicitly specified? I'm asking because I still consider diagnostics of function templates unfortunate. `template void f()` is fine, as is `void f(T) [with T = float]`, but `void f() [with T = float]` could be better. I.e. if the template parameter appears somewhere in the function parameter list, dump_template_parms would only produce noise. If, however, the template parameter was given explicitly, it would be nice if it could show up accordingly in diagnostics. NON_DEFAULT_TEMPLATE_ARGS_COUNT has that information, though there are some issues with it. Attached is my WIP from May to improve it somewhat, if that's interesting. It is interesting. I used your patch to come up with the attached. Patch. I must say, I didn't try to read through all the cp/pt.c code to understand all of what you did there (which is why my ChangeLog entry says "Jason?"), but it works for me (and all of `make check`). Anyway, I'd like to propose the following before finishing my diagnose_as patch. I believe it's useful to fix this part first. The diagnostic/default- template-args-[12].C tests show a lot of examples of the intent of this patch. And the remaining changes to the testsuite show how it changes diagnostic output. -- 8< The choice when to print a function template parameter was still suboptimal. That's because sometimes the function template parameter list only adds noise, while in other situations the lack of a function template parameter list makes diagnostic messages hard to understand. The general idea of this change is to print template parms wherever they would appear in the source code as well. Thus, the diagnostics code needs to know whether any template parameter was given explicitly. Signed-off-by: Matthias Kretz gcc/testsuite/ChangeLog: * g++.dg/debug/dwarf2/template-params-12n.C: Optionally, allow DW_AT_default_value. * g++.dg/diagnostic/default-template-args-1.C: New. * g++.dg/diagnostic/default-template-args-2.C: New. * g++.dg/diagnostic/param-type-mismatch-2.C: Expect template parms in diagnostic. * g++.dg/ext/pretty1.C: Expect function template specialization to not pretty-print template parms. * g++.old-deja/g++.ext/pretty3.C: Ditto. * g++.old-deja/g++.pt/memtemp77.C: Ditto. * g++.dg/goacc/template.C: Expect function template parms for explicit arguments. * g++.dg/gomp/declare-variant-7.C: Expect no function template parms for deduced arguments. * g++.dg/template/error40.C: Expect only non-default template arguments in diagnostic. gcc/cp/ChangeLog: * cp-tree.h (GET_NON_DEFAULT_TEMPLATE_ARGS_COUNT): Return absolute value of stored constant. (EXPLICIT_TEMPLATE_ARGS_P): New. (SET_EXPLICIT_TEMPLATE_ARGS_P): New. (TFF_AS_PRIMARY): New constant. * error.c (get_non_default_template_args_count): Avoid GET_NON_DEFAULT_TEMPLATE_ARGS_COUNT if NON_DEFAULT_TEMPLATE_ARGS_COUNT is a NULL_TREE. Make independent of flag_pretty_templates. (dump_template_bindings): Add flags parameter to be passed to get_non_default_template_args_count. Print only non-default template arguments. (dump_function_decl): Call dump_function_name and dump_type of the DECL_CONTEXT with specialized template and set TFF_AS_PRIMARY for their flags. (dump_function_name): Add and document conditions for calling dump_template_parms. (dump_template_parms): Print only non-default template parameters. * pt.c (determine_specialization): Jason? (template_parms_level_to_args): Jason? (copy_template_args): Jason? (fn_type_unification): Set EXPLICIT_TEMPLATE_ARGS_P on the template arguments tree if any template parameter was explicitly given. (type_unification_real): Jason? (get_partial_spec_bindings): Jason? (tsubst_template_args): Determine number of defaulted arguments from new argument vector, if possible. --- gcc/cp/cp-tree.h | 18 +++- gcc/cp/error.c
[PATCH v2] rs6000: Test case adjustments for new builtins
Hi! I recently submitted [1] to make adjustments to test cases for the new builtins support, mostly due to error messages changing for consistency. Thanks for the previous review. I've reviewed the reasons for the changes and removed unrelated changes as requested. A couple of comments: - For fold-vect-splat-floatdouble.c and fold-vec-splat-longlong.c, the existing test cases have some bad tests in them (checking two bits when only one bit is meaningful). The new builtin support catches this but the old support did not. Removing those bad cases changes some of the scan-assembler-times expected values. - For int_128bit-runnable.c, I chose not to do gimple folding on the 128-bit comparison operations in the new implementation, because doing so results in bad code that splits things into two 64-bit values. That needs separate attention; but the point here is, when I did that, I started generating more of the vcmpequq, vcmpgtsq, and vcmpgtuq instructions. Everything else here is hopefully straightforward, and unchanged from the previous submission. Bootstrapped and tested on powerpc64le-linux-gnu, and on powerpc64-linux-gnu with -m32 and -m64. Is this okay for trunk? Thanks! Bill [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578615.html 2021-11-15 Bill Schmidt gcc/testsuite/ * gcc.target/powerpc/bfp/scalar-extract-exp-2.c: Adjust error message. * gcc.target/powerpc/bfp/scalar-extract-sig-2.c: Likewise. * gcc.target/powerpc/bfp/scalar-insert-exp-2.c: Likewise. * gcc.target/powerpc/bfp/scalar-insert-exp-5.c: Likewise. * gcc.target/powerpc/bfp/scalar-insert-exp-8.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-neg-2.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-neg-3.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-neg-5.c: Likewise. * gcc.target/powerpc/byte-in-set-2.c: Likewise. * gcc.target/powerpc/cmpb-2.c: Likewise. * gcc.target/powerpc/cmpb32-2.c: Likewise. * gcc.target/powerpc/crypto-builtin-2.c: Likewise. * gcc.target/powerpc/fold-vec-splat-floatdouble.c: Remove invalid test and adjust xxpermdi count. * gcc.target/powerpc/fold-vec-splat-longlong.c: Remove invalid tests and adjust instruction counts. * gcc.target/powerpc/fold-vec-splat-misc-invalid.c: Adjust error messages. * gcc.target/powerpc/int_128bit-runnable.c: Adjust instruction counts since we do better by not gimple-folding some builtins. * gcc.target/powerpc/pr80315-1.c: Adjust error message. * gcc.target/powerpc/pr80315-2.c: Likewise. * gcc.target/powerpc/pr80315-3.c: Likewise. * gcc.target/powerpc/pr80315-4.c: Likewise. * gcc.target/powerpc/pr88100.c: Likewise. * gcc.target/powerpc/pragma_misc9.c: Likewise. * gcc.target/powerpc/pragma_power8.c: Undef _RS6000_VECDEFINES_H. * gcc.target/powerpc/pragma_power9.c: Likewise. * gcc.target/powerpc/test_fpscr_drn_builtin_error.c: Adjust error messages. * gcc.target/powerpc/test_fpscr_rn_builtin_error.c: Likewise. * gcc.target/powerpc/vec-gnb-2.c: Likewise. * gcc.target/powerpc/vsu/vec-all-nez-7.c: Likewise. * gcc.target/powerpc/vsu/vec-any-eqz-7.c: Likewise. * gcc.target/powerpc/vsu/vec-cmpnez-7.c: Likewise. * gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c: Likewise. * gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c: Likewise. * gcc.target/powerpc/vsu/vec-xl-len-13.c: Likewise. * gcc.target/powerpc/vsu/vec-xst-len-12.c: Likewise. --- .../gcc.target/powerpc/bfp/scalar-extract-exp-2.c | 2 +- .../gcc.target/powerpc/bfp/scalar-extract-sig-2.c | 2 +- .../gcc.target/powerpc/bfp/scalar-insert-exp-2.c | 2 +- .../gcc.target/powerpc/bfp/scalar-insert-exp-5.c | 2 +- .../gcc.target/powerpc/bfp/scalar-insert-exp-8.c | 2 +- .../gcc.target/powerpc/bfp/scalar-test-neg-2.c | 2 +- .../gcc.target/powerpc/bfp/scalar-test-neg-3.c | 2 +- .../gcc.target/powerpc/bfp/scalar-test-neg-5.c | 2 +- gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c | 2 +- gcc/testsuite/gcc.target/powerpc/cmpb-2.c | 2 +- gcc/testsuite/gcc.target/powerpc/cmpb32-2.c| 2 +- .../gcc.target/powerpc/crypto-builtin-2.c | 14 +++--- .../powerpc/fold-vec-splat-floatdouble.c | 4 ++-- .../gcc.target/powerpc/fold-vec-splat-longlong.c | 10 +++--- .../powerpc/fold-vec-splat-misc-invalid.c | 8 .../gcc.target/powerpc/int_128bit-runnable.c | 6 +++--- gcc/testsuite/gcc.target/powerpc/pr80315-1.c | 2 +- gcc/testsuite/gcc.target/powerpc/pr80315-2.c | 2 +- gcc/testsuite/gcc.target/powerpc/pr80315-3.c | 2 +- gcc/testsuite/gcc.target/powerpc/pr80315-4.c | 2 +- gcc/testsuite/gcc.target/powerpc/pr88100.c | 12 ++-- gcc/testsuite/gcc.
Re: [RFC] c++: Print function template parms when relevant (was: [PATCH v4] c++: Add gnu::diagnose_as attribute)
On Tuesday, 16 November 2021 21:25:33 CET Jason Merrill wrote: > On 11/8/21 15:00, Matthias Kretz wrote: > > I forgot to mention why I tagged it [RFC]: I needed one more bit of > > information on the template args TREE_VEC to encode > > EXPLICIT_TEMPLATE_ARGS_P. Its TREE_CHAIN already points to an integer > > constant denoting the number of non-default arguments, so I couldn't > > trivially replace that. Therefore, I used the sign of that integer. I was > > hoping to find a cleaner solution, though. > It seems that we aren't using any TREE_LANG_FLAG_n on TREE_VEC, so that > would be a cleaner solution. I tried that first but realized that TREE_VEC doesn't allow any TREE_LANG_FLAGs (it uses those bits for the length IIRC). And setting the TREE_LANG_FLAGs on the TREE_CHAIN of the TREE_VEC can't work either (since the int constants are shared between many trees). Should I maybe turn the TREE_CHAIN into a TREE_LIST using TREE_PURPOSE and TREE_VALUE for EXPLICIT_TEMPLATE_ARGS_P and non-default arguments, respectively? (And where would I document this?) -- ── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de stdₓ::simd ──
Re: [RFC] c++: Print function template parms when relevant (was: [PATCH v4] c++: Add gnu::diagnose_as attribute)
On 11/16/21 15:42, Matthias Kretz wrote: On Tuesday, 16 November 2021 21:25:33 CET Jason Merrill wrote: On 11/8/21 15:00, Matthias Kretz wrote: I forgot to mention why I tagged it [RFC]: I needed one more bit of information on the template args TREE_VEC to encode EXPLICIT_TEMPLATE_ARGS_P. Its TREE_CHAIN already points to an integer constant denoting the number of non-default arguments, so I couldn't trivially replace that. Therefore, I used the sign of that integer. I was hoping to find a cleaner solution, though. It seems that we aren't using any TREE_LANG_FLAG_n on TREE_VEC, so that would be a cleaner solution. I tried that first but realized that TREE_VEC doesn't allow any TREE_LANG_FLAGs (it uses those bits for the length IIRC). And setting the TREE_LANG_FLAGs on the TREE_CHAIN of the TREE_VEC can't work either (since the int constants are shared between many trees). Should I maybe turn the TREE_CHAIN into a TREE_LIST using TREE_PURPOSE and TREE_VALUE for EXPLICIT_TEMPLATE_ARGS_P and non-default arguments, respectively? (And where would I document this?) Maybe a TREE_LIST if there are explicit template arguments to a function template, where TREE_PURPOSE is the number of explicit arguments and TREE_VALUE is the number of non-default arguments. I'd document it at the definition of NON_DEFAULT_TEMPLATE_ARGS_COUNT. The SET/GET macros should become functions. Jason
Re: [RFC] c++: Print function template parms when relevant (was: [PATCH v4] c++: Add gnu::diagnose_as attribute)
On Tuesday, 16 November 2021 21:49:31 CET Jason Merrill wrote: > On 11/16/21 15:42, Matthias Kretz wrote: > > On Tuesday, 16 November 2021 21:25:33 CET Jason Merrill wrote: > >> On 11/8/21 15:00, Matthias Kretz wrote: > >>> I forgot to mention why I tagged it [RFC]: I needed one more bit of > >>> information on the template args TREE_VEC to encode > >>> EXPLICIT_TEMPLATE_ARGS_P. Its TREE_CHAIN already points to an integer > >>> constant denoting the number of non-default arguments, so I couldn't > >>> trivially replace that. Therefore, I used the sign of that integer. I > >>> was > >>> hoping to find a cleaner solution, though. > >> > >> It seems that we aren't using any TREE_LANG_FLAG_n on TREE_VEC, so that > >> would be a cleaner solution. > > > > I tried that first but realized that TREE_VEC doesn't allow any > > TREE_LANG_FLAGs (it uses those bits for the length IIRC). And setting the > > TREE_LANG_FLAGs on the TREE_CHAIN of the TREE_VEC can't work either (since > > the int constants are shared between many trees). > > > > Should I maybe turn the TREE_CHAIN into a TREE_LIST using TREE_PURPOSE and > > TREE_VALUE for EXPLICIT_TEMPLATE_ARGS_P and non-default arguments, > > respectively? (And where would I document this?) > > Maybe a TREE_LIST if there are explicit template arguments to a function > template, where TREE_PURPOSE is the number of explicit arguments and > TREE_VALUE is the number of non-default arguments. > > I'd document it at the definition of NON_DEFAULT_TEMPLATE_ARGS_COUNT. > The SET/GET macros should become functions. Sounds good. I'll come up with a new patch ASAP. -- ── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de stdₓ::simd ──
Re: [PATCH] Fix spelling of ones' complement.
On Nov 15, 2021, at 5:48 PM, Marek Polacek via Gcc-patches wrote: > > Nitpicking time. It's spelled "ones' complement" rather than "one's > complement". I didn't go into config/. > > Ok for trunk? So, is it two's complement or twos' complement then? Seems like it should be the same, but wikipedia suggests it is two's complement, as does google. If that is wrong, you should go edit it as well. :-)
Re: [PATCH] Fix spelling of ones' complement.
On Tue, Nov 16, 2021 at 01:09:15PM -0800, Mike Stump via Gcc-patches wrote: > On Nov 15, 2021, at 5:48 PM, Marek Polacek via Gcc-patches > wrote: > > > > Nitpicking time. It's spelled "ones' complement" rather than "one's > > complement". I didn't go into config/. > > > > Ok for trunk? > > So, is it two's complement or twos' complement then? Seems like it should be > the same, but wikipedia suggests it is two's complement, as does google. If > that is wrong, you should go edit it as well. :-) It is "two's complement": https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584543.html but Knuth also continues to say that there's "twos' complement notation", which "has radix 3 and complementation with respect to (2...22)_3." It's not lost on me how inconsequential this patch is; I'm happy to just drop it and let the copy editor in me sleep. Marek
[committed] libstdc++: Fix tests for constexpr std::string
Tested powerpc64le-linux, pushed to trunk. Some tests fail when run with -D_GLIBCXX_USE_CXX11_ABI or -stdgnu++20. libstdc++-v3/ChangeLog: * include/bits/basic_string.h (operator<=>): Use constexpr unconditionally. * testsuite/21_strings/basic_string/modifiers/constexpr.cc: Require cxx11-abit effective target. * testsuite/21_strings/headers/string/synopsis.cc: Add conditional constexpr to declarations, and adjust relational operators for C++20. --- libstdc++-v3/include/bits/basic_string.h | 6 ++-- .../basic_string/modifiers/constexpr.cc | 1 + .../21_strings/headers/string/synopsis.cc | 33 +-- 3 files changed, 33 insertions(+), 7 deletions(-) diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index b6945f1cdfb..0b7d6c0a981 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -3546,8 +3546,7 @@ _GLIBCXX_END_NAMESPACE_CXX11 * greater than, or incomparable with `__rhs`. */ template -_GLIBCXX20_CONSTEXPR -inline auto +constexpr auto operator<=>(const basic_string<_CharT, _Traits, _Alloc>& __lhs, const basic_string<_CharT, _Traits, _Alloc>& __rhs) noexcept -> decltype(__detail::__char_traits_cmp_cat<_Traits>(0)) @@ -3561,8 +3560,7 @@ _GLIBCXX_END_NAMESPACE_CXX11 * greater than, or incomparable with `__rhs`. */ template -_GLIBCXX20_CONSTEXPR -inline auto +constexpr auto operator<=>(const basic_string<_CharT, _Traits, _Alloc>& __lhs, const _CharT* __rhs) noexcept -> decltype(__detail::__char_traits_cmp_cat<_Traits>(0)) diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/modifiers/constexpr.cc b/libstdc++-v3/testsuite/21_strings/basic_string/modifiers/constexpr.cc index c875a3a19ad..a4627714d9a 100644 --- a/libstdc++-v3/testsuite/21_strings/basic_string/modifiers/constexpr.cc +++ b/libstdc++-v3/testsuite/21_strings/basic_string/modifiers/constexpr.cc @@ -1,5 +1,6 @@ // { dg-options "-std=gnu++20" } // { dg-do compile { target c++20 } } +// { dg-require-effective-target cxx11-abi } #include #include diff --git a/libstdc++-v3/testsuite/21_strings/headers/string/synopsis.cc b/libstdc++-v3/testsuite/21_strings/headers/string/synopsis.cc index f14c4ae831c..f12345ed426 100644 --- a/libstdc++-v3/testsuite/21_strings/headers/string/synopsis.cc +++ b/libstdc++-v3/testsuite/21_strings/headers/string/synopsis.cc @@ -26,6 +26,12 @@ # define NOTHROW #endif +#if __cplusplus >= 202002L +# define CONSTEXPR constexpr +#else +# define CONSTEXPR +#endif + namespace std { // lib.char.traits, character traits: template @@ -40,33 +46,52 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 _GLIBCXX_END_NAMESPACE_CXX11 template + CONSTEXPR basic_string operator+(const basic_string& lhs, const basic_string& rhs); template + CONSTEXPR basic_string operator+(const charT* lhs, const basic_string& rhs); template + CONSTEXPR basic_string operator+(charT lhs, const basic_string& rhs); template + CONSTEXPR basic_string operator+(const basic_string& lhs, const charT* rhs); template + CONSTEXPR basic_string operator+(const basic_string& lhs, charT rhs); template + CONSTEXPR bool operator==(const basic_string& lhs, const basic_string& rhs) NOTHROW; template - bool operator==(const charT* lhs, - const basic_string& rhs); - template + CONSTEXPR bool operator==(const basic_string& lhs, const charT* rhs); + +#if __cpp_lib_three_way_comparison + template + constexpr + bool operator<=>(const basic_string& lhs, + const basic_string& rhs) NOTHROW; + template + constexpr + bool operator<=>(const basic_string& lhs, + const charT* rhs); +#else + template + CONSTEXPR + bool operator==(const charT* lhs, + const basic_string& rhs); template bool operator!=(const basic_string& lhs, const basic_string& rhs) NOTHROW; @@ -114,9 +139,11 @@ _GLIBCXX_END_NAMESPACE_CXX11 template bool operator>=(const charT* lhs, const basic_string& rhs); +#endif // lib.string.special: template + CONSTEXPR void swap(basic_string& lhs, basic_string& rhs) #if __cplusplus >= 201103L -- 2.31.1