[PATCH v2] LoongArch: Optimize immediate load.

2022-10-31 Thread Lulu Cheng
v1 -> v2: 1. Change the code format. 2. Fix bugs in the code. Both regression tests and spec2006 passed. The problem mentioned in the link does not move the four immediate load instructions out of the loop. It has been optimized. Now, as in the test case, four immediate load instructions are gene

Re: Adding a new thread model to GCC

2022-10-31 Thread i.nixman--- via Gcc-patches
On 2022-10-31 09:18, Eric Botcazou wrote: hello Eric! This also changes libstdc++ to pass -D_WIN32_WINNT=0x0600 but only when the switch --enable-libstdcxx-threads is passed, which means that C++11 threads are still disabled by default *unless* MinGW-W64 itself is configured for Windows Vista

Re: [RFC] propgation leap over memory copy for struct

2022-10-31 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: > On Mon, Oct 31, 2022 at 04:13:38PM -0600, Jeff Law wrote: >> On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote: >> >We know that for struct variable assignment, memory copy may be used. >> >And for memcpy, we may load and store more bytes as possible at one time. >>

Re: [RFC] propgation leap over memory copy for struct

2022-10-31 Thread Jiufu Guo via Gcc-patches
Jeff Law writes: > On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote: >> Hi, >> >> We know that for struct variable assignment, memory copy may be used. >> And for memcpy, we may load and store more bytes as possible at one time. >> While it may be not best here: >> 1. Before/after stuct variabl

Re: [RFC] propgation leap over memory copy for struct

2022-10-31 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: > Hi! > > On Mon, Oct 31, 2022 at 10:42:35AM +0800, Jiufu Guo wrote: >> #define FN 4 >> typedef struct { double a[FN]; } A; >> >> A foo (const A *a) { return *a; } >> A bar (const A a) { return a; } >> /// >> >> If FN<=2; the size of "A" fits into TImode, then thi

Re: [wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-10-31 Thread Hongtao Liu via Gcc-patches
On Tue, Nov 1, 2022 at 9:21 AM Kong, Lingling via Gcc-patches wrote: > > Hi > > The patch is for mention Intel __bf16 support in AVX512BF16 intrinsics. > Ok for master ? > > Thanks, > Lingling > > --- > htdocs/gcc-13/changes.html | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/htdocs/g

[pushed] c++: set TREE_NOTHROW after genericize

2022-10-31 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk. -- >8 -- genericize might introduce function calls (and does on the contracts branch), so it's safer to set this flag later. gcc/cp/ChangeLog: * decl.cc (finish_function): Set TREE_NOTHROW later in the function. --- gcc/cp/decl.cc | 16 +++

[wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-10-31 Thread Kong, Lingling via Gcc-patches
Hi The patch is for mention Intel __bf16 support in AVX512BF16 intrinsics. Ok for master ? Thanks, Lingling --- htdocs/gcc-13/changes.html | 2 ++ 1 file changed, 2 insertions(+) diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index 7c6bfa6e..cd0282f1 100644 --- a/htdocs/

Re: [RFC] propgation leap over memory copy for struct

2022-10-31 Thread Segher Boessenkool
On Mon, Oct 31, 2022 at 04:13:38PM -0600, Jeff Law wrote: > On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote: > >We know that for struct variable assignment, memory copy may be used. > >And for memcpy, we may load and store more bytes as possible at one time. > >While it may be not best here: >

Re: [RFC] propgation leap over memory copy for struct

2022-10-31 Thread Segher Boessenkool
Hi! On Mon, Oct 31, 2022 at 10:42:35AM +0800, Jiufu Guo wrote: > #define FN 4 > typedef struct { double a[FN]; } A; > > A foo (const A *a) { return *a; } > A bar (const A a) { return a; } > /// > > If FN<=2; the size of "A" fits into TImode, then this code can be optimized > (by subreg/cse/

Re: Re: [PATCH] RISC-V: Fix RVV testcases.

2022-10-31 Thread 钟居哲
These cases actually doesn't care about -mabi, they just need 'v' in -march. Can you tell me how to fix these testcases for "fails on targets without ilp32d" ? These failures are bogus failures since if you specify -mabi=ilp32d when you are using GNU toolchain which is build up with "--arch=ilp32

Re: [committed] More gimple const/copy propagation opportunities

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/1/22 12:55, Bernhard Reutner-Fischer wrote: On Fri, 30 Sep 2022 17:32:34 -0600 Jeff Law wrote: + /* This looks good from a CFG standpoint. Now look at the guts + of PRED. Basically we want to verify there are no PHI nodes + and no real statements. */ + if (! gimple_seq_emp

Re: [PATCH] Add __builtin_iseqsig()

2022-10-31 Thread Joseph Myers
On Mon, 31 Oct 2022, FX via Gcc-patches wrote: > - rounded conversions: converting, from an integer or floating point > type, into another floating point type, with specific rounding mode > passed as argument These don't have standard C names. The way to do these in C would be using the FENV_

Re: [PATCH] RISC-V: Fix RVV testcases.

2022-10-31 Thread Palmer Dabbelt
On Mon, 31 Oct 2022 15:00:49 PDT (-0700), gcc-patches@gcc.gnu.org wrote: On 10/30/22 19:40, juzhe.zh...@rivai.ai wrote: From: Ju-Zhe Zhong gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-2.c: Change ilp32d to ilp32. * gcc.target/riscv/rvv/base/abi-3.c: Ditto.

Re: Re: [PATCH] RISC-V: Fix RVV testcases.

2022-10-31 Thread 钟居哲
These testcases are not depend on the ABI specification. I pick up the minimum ABI setting so that it won't fail. The naming of abi-* tests may be confusing, I can change the naming in the next time. juzhe.zh...@rivai.ai From: Jeff Law Date: 2022-11-01 06:00 To: juzhe.zhong; gcc-patches CC: sc

Re: [PATCH v5] RISC-V: Libitm add RISC-V support.

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/29/22 03:01, Xiongchuan Tan wrote: Reviewed-by: Palmer Dabbelt Acked-by: Palmer Dabbelt libitm/ChangeLog: * configure.tgt: Add riscv support. * config/riscv/asm.h: New file. * config/riscv/sjlj.S: New file. * config/riscv/target.h: New file. Pushed

Re: [RFC] propgation leap over memory copy for struct

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote: Hi, We know that for struct variable assignment, memory copy may be used. And for memcpy, we may load and store more bytes as possible at one time. While it may be not best here: 1. Before/after stuct variable assignment, the vaiable may be o

Re: [PATCH] RISC-V: Fix RVV testcases.

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/30/22 19:40, juzhe.zh...@rivai.ai wrote: From: Ju-Zhe Zhong gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-2.c: Change ilp32d to ilp32. * gcc.target/riscv/rvv/base/abi-3.c: Ditto. * gcc.target/riscv/rvv/base/abi-4.c: Ditto. * gcc.target/ris

Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:57, Tamar Christina wrote: Hi All, The current vector extract pattern can only extract from a vector when the position to extract is a multiple of the vector bitsize as a whole. That means extract something like a V2SI from a V4SI vector from position 32 isn't possible as 32 is

[r13-3570 Regression] FAIL: g++.dg/other/pr39060.C -std=c++98 (test for excess errors) on Linux/x86_64

2022-10-31 Thread haochen.jiang via Gcc-patches
On Linux/x86_64, 259a11555c90783e53c046c310080407ee54a31e is the first bad commit commit 259a11555c90783e53c046c310080407ee54a31e Author: Jakub Jelinek Date: Mon Oct 31 09:09:48 2022 +0100 builtins: Add various complex builtins for _Float{16,32,64,128,32x,64x,128x} caused FAIL: g++.dg/ot

Re: [PATCH 2/8]middle-end: Recognize scalar widening reductions

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:57, Tamar Christina wrote: Hi All, This adds a new optab and IFNs for REDUC_PLUS_WIDEN where the resulting scalar reduction has twice the precision of the input elements. At some point in a later patch I will also teach the vectorizer to recognize this builtin once I figure out

Re: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:56, Tamar Christina wrote: Hi All, This patch series is to add recognition of pairwise operations (reductions) in match.pd such that we can benefit from them even at -O1 when the vectorizer isn't enabled. Ths use of these allow for a lot simpler codegen in AArch64 and allows us

Re: [PATCH]middle-end simplify complex if expressions where comparisons are inverse of one another.

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:42, Tamar Christina via Gcc-patches wrote: Hi, This is a cleaned up version addressing all feedback. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * match.pd: Add new rule. gcc/tests

Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:53, Tamar Christina wrote: Hi All, This adds a new test-and-branch optab that can be used to do a conditional test of a bit and branch. This is similar to the cbranch optab but instead can test any arbitrary bit inside the register. This patch recognizes boolean comparisons a

[PATCH] x86: Track converted/skipped registers in STV

2022-10-31 Thread H.J. Lu via Gcc-patches
When converting integer computations into vector ones, we build a chain from an integer definition instruction together with all dependent use instructions. The integer computations on the chain are converted to vector ones if the total vector costs are lower than the integer ones. Since the same

[PATCH] libstdc++: Implement ranges::as_rvalue_view from P2446R2

2022-10-31 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk? libstdc++-v3/ChangeLog: * include/std/ranges (as_rvalue_view): Define. (enable_borrowed_range): Define. (views::__detail::__can_as_rvalue_view): Define. (views::_AsRvalue, views::as_rvalue): Define.

Re: [PATCH, v2] Fortran: ordering of hidden procedure arguments [PR107441]

2022-10-31 Thread Harald Anlauf via Gcc-patches
Hi Mikael, thanks a lot, your testcases broke my initial (and incorrect) patch in multiple ways. I understand now that the right solution is much simpler and smaller. I've added your testcases, see attached, with a simple scan of the dump for the generated order of hidden arguments in the funct

Re: [PATCH] Add __builtin_iseqsig()

2022-10-31 Thread FX via Gcc-patches
Hi, Just adding, from the Fortran 2018 perspective, things we will need to implement for which I think support from the middle-end might be necessary: - rounded conversions: converting, from an integer or floating point type, into another floating point type, with specific rounding mode passed

[PATCH] c, analyzer: support named constants in analyzer [PR106302]

2022-10-31 Thread David Malcolm via Gcc-patches
The analyzer's file-descriptor state machine tracks the access mode of opened files, so that it can emit -Wanalyzer-fd-access-mode-mismatch. To do this, its symbolic execution needs to "know" the values of the constants "O_RDONLY", "O_WRONLY", and "O_ACCMODE". Currently analyzer/sm-fd.cc simply u

Re: [PATCH v4] btf: Add support to BTF_KIND_ENUM64 type

2022-10-31 Thread Indu Bhagat via Gcc-patches
On 10/21/22 2:28 AM, Indu Bhagat via Gcc-patches wrote: On 10/19/22 19:05, Guillermo E. Martinez wrote: Hello, The following is patch v4 to update BTF/CTF backend supporting BTF_KIND_ENUM64 type. Changes from v3:    + Remove `ctf_enum_binfo' structure.    + Remove -m{little,big}-endian from dg

Re: [PATCH] Add __builtin_iseqsig()

2022-10-31 Thread Joseph Myers
On Fri, 28 Oct 2022, Jeff Law via Gcc-patches wrote: > Joseph, do you have bits in this space that are going to be landing soon, or > is your C2X work focused elsewhere?  Are there other C2X routines we need to > be proving builtins for? I don't have any builtins work planned for GCC 13 (maybe ad

Re: [PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization.

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:34, Tamar Christina wrote: The type of the expression should be available via the mode and the signedness, no? So maybe to avoid having both RTX and TREE on the target hook pass it a wide_int instead for the divisor? Done. Bootstrapped Regtested on aarch64-none-linux-gnu, x86

Re: [committed] libstdc++: Fix compare_exchange_padding.cc test for std::atomic_ref

2022-10-31 Thread Jonathan Wakely via Gcc-patches
On Mon, 31 Oct 2022 at 17:03, Eric Botcazou wrote: > > > I suppose we could use memcmp on the as variable itself, to inspect > > the actual stored padding rather than the returned copy of it. > > Yes, that's probably the only safe stance when optimization is enabled. Strictly speaking, it's not

Re: [PATCH] libstdc++-v3: support for extended floating point types

2022-10-31 Thread Jonathan Wakely via Gcc-patches
On Mon, 31 Oct 2022 at 16:57, Jakub Jelinek wrote: > > On Mon, Oct 31, 2022 at 10:26:11AM +, Jonathan Wakely wrote: > > > --- libstdc++-v3/include/std/complex.jj 2022-10-21 08:55:43.037675332 > > > +0200 > > > +++ libstdc++-v3/include/std/complex2022-10-21 17:05:36.802243229 > > > +0200

Re: [committed] libstdc++: Fix compare_exchange_padding.cc test for std::atomic_ref

2022-10-31 Thread Eric Botcazou via Gcc-patches
> I suppose we could use memcmp on the as variable itself, to inspect > the actual stored padding rather than the returned copy of it. Yes, that's probably the only safe stance when optimization is enabled. -- Eric Botcazou

Re: [PATCH] libstdc++-v3: support for extended floating point types

2022-10-31 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 31, 2022 at 10:26:11AM +, Jonathan Wakely wrote: > > --- libstdc++-v3/include/std/complex.jj 2022-10-21 08:55:43.037675332 +0200 > > +++ libstdc++-v3/include/std/complex2022-10-21 17:05:36.802243229 +0200 > > @@ -142,8 +142,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > > >/

Re: Ping^3 [PATCH V2] Add attribute hot judgement for INLINE_HINT_known_hot hint.

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/30/22 19:44, Cui, Lili wrote: On 10/20/22 19:52, Cui, Lili via Gcc-patches wrote: Hi Honza, Gentle ping https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601934.html gcc/ChangeLog * ipa-inline-analysis.cc (do_estimate_edge_time): Add function attribute judgement for INL

Re: [PATCH 1/2]middle-end Fold BIT_FIELD_REF and Shifts into BIT_FIELD_REFs alone

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:51, Tamar Christina via Gcc-patches wrote: Hi All, Here's a respin addressing review comments. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * match.pd: Add bitfield and shift folding

Re: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-10-31 Thread Jeff Law via Gcc-patches
On 10/31/22 05:38, Tamar Christina via Gcc-patches wrote: Hi All, This is a respin with all feedback addressed. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * match.pd: Add fneg/fadd rule. gcc/testsuite/ChangeLog:

[PATCH v7 34/34] Add -mpure-code support to the CM0 functions.

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel Makefile.in (MPURE_CODE): New macro defines __PURE_CODE__. (gcc_compile): Appended MPURE_CODE. lib1funcs.S (FUNC_START_SECTION): Set flags for __PURE_CODE__. clz2.S (__clzsi2): Added -mpure-code compatible instructions.

[PATCH v7 32/34] Import float<->__fp16 conversion from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/fcast.S (__aeabi_h2f, __aeabi_f2h): Added functions. * config/arm/fp16 (__gnu_f2h_ieee, __gnu_h2f_ieee, __gnu_f2h_alternative, __gnu_h2f_alternative): Disable build for v6m multilibs. * config/arm/t-b

[PATCH v7 30/34] Import float-to-integer conversion from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi-lib.h (muldi3): Removed duplicate. (fixunssfsi) Removed obsolete RENAME_LIBRARY directive. * config/arm/eabi/ffixed.S (__aeabi_f2iz, __aeabi_f2uiz, __aeabi_f2lz, __aeabi_f2ulz): New file. * co

[PATCH v7 29/34] Import integer-to-float conversion from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi-lib.h (__floatdisf, __floatundisf): Remove obsolete RENAME_LIBRARY directives. * config/arm/eabi/ffloat.S (__aeabi_i2f, __aeabi_l2f, __aeabi_ui2f, __aeabi_ul2f): New file. * config/arm/lib1fun

[PATCH v7 31/34] Import float<->double conversion from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/fcast.S (__aeabi_d2f, __aeabi_f2d): New file. * config/arm/lib1funcs.S: #include eabi/fcast.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Added _arm_d2f and _arm_f2d. --- libgcc/config/arm/eabi/fcast.S | 2

[PATCH v7 28/34] Import float division from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/fdiv.S (__divsf3, __fp_divloopf): New file. * config/arm/lib1funcs.S: #include eabi/fdiv.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Added _divsf3 and _fp_divloopf. --- libgcc/config/arm/eabi/fdiv.S | 26

[PATCH v7 27/34] Import float multiplication from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/fmul.S (__mulsf3): New file. * config/arm/lib1funcs.S: #include eabi/fmul.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Moved _mulsf3 to global scope (this object was previously blocked on v6m build

[PATCH v7 33/34] Drop single-precision Thumb-1 soft-float functions

2022-10-31 Thread Daniel Engel
With the complete CM0 library integrated, regression testing showed new failures with the message "compilation failed to produce executable": gcc.dg/fixed-point/convert-float-1.c gcc.dg/fixed-point/convert-float-3.c gcc.dg/fixed-point/convert-sat.c Investigating, this appears to be ca

[PATCH v7 26/34] Import float addition and subtraction from the CM0 library

2022-10-31 Thread Daniel Engel
Since this is the first import of single-precision functions, some common parsing and formatting routines are also included. These common rotines will be referenced by other functions in subsequent commits. However, even if the size penalty is accounted entirely to __addsf3(), the total compiled s

[PATCH v7 23/34] Refactor Thumb-1 float comparison into a new file

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi-v6m.S (__aeabi_cfcmpeq, __aeabi_cfcmple, __aeabi_cfrcmple, __aeabi_fcmpeq, __aeabi_fcmple, aeabi_fcmple, __aeabi_fcmpgt, aeabi_fcmpge): Moved to ... * config/arm/eabi/fcmp.S: New file. * confi

[PATCH v7 20/34] Refactor Thumb-1 64-bit division into a new file

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi-v6m.S (__aeabi_ldivmod/ldivmod): Moved to ... * config/arm/eabi/ldiv.S: New file. * config/arm/lib1funcs.S: #include eabi/ldiv.S (v6m only). --- libgcc/config/arm/bpabi-v6m.S | 81 -

[PATCH v7 24/34] Import float comparison from the CM0 library

2022-10-31 Thread Daniel Engel
These functions are significantly smaller and faster than the wrapper functions and soft-float implementation they replace. Using the first comparison operator (e.g. '<=') in any program costs about 70 bytes initially, but every additional operator incrementally adds just 4 bytes. NOTE: It seems

[PATCH v7 25/34] Refactor Thumb-1 float subtraction into a new file

2022-10-31 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi-v6m.S (__aeabi_frsub): Moved to ... * config/arm/eabi/fadd.S: New file. * config/arm/lib1funcs.S: #include eabi/fadd.S (v6m only). --- libg

[PATCH v7 22/34] Import integer multiplication from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/lmul.S: New file for __muldi3(), __mulsidi3(), and __umulsidi3(). * config/arm/lib1funcs.S: #eabi/lmul.S (v6m only). * config/arm/t-elf: Add the new objects to LIB1ASMFUNCS. --- libgcc/config/arm/eab

[PATCH v7 19/34] Import 32-bit division from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/idiv.S: New file for __udivsi3() and __divsi3(). * config/arm/lib1funcs.S: #include eabi/idiv.S (v6m only). --- libgcc/config/arm/eabi/idiv.S | 299 ++ libgcc/config/arm/lib1funcs.S |

[PATCH v7 14/34] Import 'parity' functions from the CM0 library

2022-10-31 Thread Daniel Engel
The functional overlap between the single- and double-word functions makes functions makes this implementation about half the size of the C functions if both functions are linked in the same application. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/parity.S: New file for __

[PATCH v7 21/34] Import 64-bit division from the CM0 library

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi.c: Deleted unused file. * config/arm/eabi/ldiv.S (__aeabi_ldivmod, __aeabi_uldivmod): Replaced wrapper functions with a complete implementation. * config/arm/t-bpabi (LIB2ADD_ST): Removed bpabi.c.

[PATCH v7 16/34] Refactor Thumb-1 64-bit comparison into a new file

2022-10-31 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi-v6m.S (__aeabi_lcmp, __aeabi_ulcmp): Moved to ... * config/arm/eabi/lcmp.S: New file. * config/arm/lib1funcs.S: #include eabi/lcmp.S. --- l

[PATCH v7 12/34] Import 'clrsb' functions from the CM0 library

2022-10-31 Thread Daniel Engel
This implementation provides an efficient tail call to __clzsi2(), making the functions rather smaller and faster than the C versions. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bits/clz2.S (__clrsbsi2, __clrsbdi2): Added new functions. * config/arm/t-elf

[PATCH v7 13/34] Import 'ffs' functions from the CM0 library

2022-10-31 Thread Daniel Engel
This implementation provides an efficient tail call to __clzdi2(), making the functions rather smaller and faster than the C versions. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bits/ctz2.S (__ffssi2, __ffsdi2): New functions. * config/arm/t-elf (LIB1ASMFUNCS): Ad

[PATCH v7 18/34] Merge Thumb-2 optimizations for 64-bit comparison

2022-10-31 Thread Daniel Engel
This effectively merges support for all architecture variants into a common function path with appropriate build conditions. ARM performance is 1-2 instructions faster; Thumb-2 is about 50% faster. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/bpabi.S (__aeabi_lcmp, __aeabi_

[PATCH v7 11/34] Import 64-bit shift functions from the CM0 library

2022-10-31 Thread Daniel Engel
The Thumb versions of these functions are each 1-2 instructions smaller and faster, and branchless when the IT instruction is available. The ARM versions were converted to the "xxl/xxh" big-endian register naming convention, but are otherwise unchanged. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Eng

[PATCH v7 10/34] Import 'ctz' functions from the CM0 library

2022-10-31 Thread Daniel Engel
This version combines __ctzdi2() with __ctzsi2() into a single object with an efficient tail call. The former implementation of __ctzdi2() was in C. On architectures without __ARM_FEATURE_CLZ, this version merges the formerly separate Thumb and ARM code sequences into a unified instruction sequen

[PATCH v7 17/34] Import 64-bit comparison from CM0 library

2022-10-31 Thread Daniel Engel
These are 2-5 instructions smaller and just as fast. Branches are minimized, which will allow easier adaptation to Thumb-2/ARM mode. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/eabi/lcmp.S (__aeabi_lcmp, __aeabi_ulcmp): Replaced; add macro configuration to build _

[PATCH v7 09/34] Import 'clz' functions from the CM0 library

2022-10-31 Thread Daniel Engel
On architectures without __ARM_FEATURE_CLZ, this version combines __clzdi2() with __clzsi2() into a single object with an efficient tail call. Also, this version merges the formerly separate Thumb and ARM code implementations into a unified instruction sequence. This change significantly improves

[PATCH v7 03/34] Fix syntax warnings on conditional instructions

2022-10-31 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/lib1funcs.S (RETLDM, ARM_DIV_BODY, ARM_MOD_BODY, _interwork_call_via_lr): Moved condition code after the flags update specifier "s". (ARM_FUNC_START, THUMB_LDIV0): Removed redundant ".syntax". --- libgcc/c

[PATCH v7 08/34] Refactor 64-bit shift functions into a new file

2022-10-31 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/lib1funcs.S (__ashldi3, __ashrdi3, __lshldi3): Moved to ... * config/arm/eabi/lshift.S: New file. --- libgcc/config/arm/eabi/lshift.S | 123 +

[PATCH v7 15/34] Import 'popcnt' functions from the CM0 library

2022-10-31 Thread Daniel Engel
The functional overlap between the single- and double-word functions makes this implementation about 30% smaller than the C functions if both functions are linked together in the same appliation. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/popcnt.S (__popcountsi, __popcoun

[PATCH v7 05/34] Add the __HAVE_FEATURE_IT and IT() macros

2022-10-31 Thread Daniel Engel
These macros complement and extend the existing do_it() macro. Together, they streamline the process of optimizing short branchless contitional sequences to support ARM, Thumb-2, and Thumb-1. The inherent architecture limitations of Thumb-1 means that writing assembly code is somewhat more tedious

[PATCH v7 02/34] Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY

2022-10-31 Thread Daniel Engel
Since THUMB_FUNC_START does not insert the ".text" directive, it aligns more closely with the new FUNC_ENTRY maro and is renamed accordingly. THUMB_FUNC_START usage has been universally synonymous with the ".force_thumb" directive, so this is now folded into the definition. Usage of ".force_thumb"

[PATCH v7 06/34] Refactor 'clz' functions into a new file

2022-10-31 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/lib1funcs.S (__clzsi2i, __clzdi2): Moved to ... * config/arm/clz2.S: New file. --- libgcc/config/arm/clz2.S | 145 ++

[PATCH v7 07/34] Refactor 'ctz' functions into a new file

2022-10-31 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/lib1funcs.S (__ctzsi2): Moved to ... * config/arm/ctz2.S: New file. --- libgcc/config/arm/ctz2.S | 86 +++ libgcc/co

[PATCH v7 04/34] Reorganize LIB1ASMFUNCS object wrapper macros

2022-10-31 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2022-10-09 Daniel Engel * config/arm/t-elf (LIB1ASMFUNCS): Split macros into logical groups. --- libgcc/config/arm/t-elf | 66 + 1 file changed, 53 insertions

[PATCH v7 01/34] Add and restructure function declaration macros

2022-10-31 Thread Daniel Engel
Most of these changes support subsequent patches in this series. Particularly, the FUNC_START macro becomes part of a new macro chain: * FUNC_ENTRY Common global symbol directives * FUNC_START_SECTION FUNC_ENTRY to start a new * FUNC_START FUNC_START_SECTION <

[PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0

2022-10-31 Thread Daniel Engel
Hi Richard, I am re-submitting my libgcc patch from 2021: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html I believe I have finally made the stage1 window. Regards, Daniel --- Changes since v6:

optabs: Variable index vec_set

2022-10-31 Thread Robin Dapp via Gcc-patches
Hi, I'm looking into vec_set with variable index on s390. Uros posted a patch [1] that did not make it upstream in Nov 2020. It changed the mode of the index operand to whatever the target supports in can_vec_set_var_idx_p. I missed it back then but we indeed do not make proper use of vec_set w

Re: [committed] libstdc++: Fix compare_exchange_padding.cc test for std::atomic_ref

2022-10-31 Thread Jonathan Wakely via Gcc-patches
On Mon, 31 Oct 2022 at 15:34, Eric Botcazou wrote: > > > The test was only failing for me with -m32 (and not -m64), so I didn't > > notice until now. That probably means we should make the test fail more > > reliably if the padding isn't being cleared. > > The tests fail randomly for me on SPARC64

[GCC][PATCH v2] arm: Add pacbti related multilib support for armv8.1-m.main.

2022-10-31 Thread Srinath Parvathaneni via Gcc-patches
Hi, This patch adds the support for pacbti multlilib linking by making "-mbranch-protection=none" as default in the command line for all M-profile targets and uses "-mbranch-protection=none" for multilib matching. If any valid value is passed to "-mbranch-protection" in the command line, this new

Re: [committed] libstdc++: Fix compare_exchange_padding.cc test for std::atomic_ref

2022-10-31 Thread Eric Botcazou via Gcc-patches
> The test was only failing for me with -m32 (and not -m64), so I didn't > notice until now. That probably means we should make the test fail more > reliably if the padding isn't being cleared. The tests fail randomly for me on SPARC64/Linux: FAIL: 29_atomics/atomic/compare_exchange_padding.cc ex

[ada, patch] fix libgnat build on x86_64-linux-gnux32 with glibc <= 2.31

2022-10-31 Thread Matthias Klose
This was introduced with the fix and backports of PR103530 on x86_64-linux-gnux32 with older glibc versions (checked with 2.31), where dladdr is still in the libdl.so library, and not included in libc.so as in newer glibc versions. Linking of libgnat.so fails with [...] /usr/x86_64-linux-gnux3

Re: Adding a new thread model to GCC

2022-10-31 Thread i.nixman--- via Gcc-patches
On 2022-10-31 09:18, Eric Botcazou wrote: Hi Eric! thank you very much for the job! I will try to build our (MinGW-Builds project) builds using this patch and will report back. @Jonathan what the next steps to be taken to accept this patch? best! I have attached a revised version of th

[Patch] OpenMP/Fortran: 'target update' with strides + DT components

2022-10-31 Thread Tobias Burnus
I recently saw that gfortran does not support derived type components with 'target update', an OpenMP 5.0 feature. When adding it, I also found out that strides where not handled. There is probably some room of improvement about what to copy and what not, but copying too much should be fine. Bui

Re: [PATCH Rust front-end v3 01/46] Use DW_ATE_UTF for the Rust 'char' type

2022-10-31 Thread Tom Tromey via Gcc-patches
> "Mark" == Mark Wielaard writes: Mark> DW_LANG_Rust_old was used by old rustc compilers <= 2016 before DWARF5 Mark> assigned an official number. It might be recognized by some Mark> debuggers. FWIW I wouldn't worry about it any more. We could probably just remove the '_old' constant. Tom

[Ping x2] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-10-31 Thread Chung-Lin Tang
Ping x2. On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: > Ping. > > On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >> Hi Tom, >> I had a patch submitted earlier, where I reported that the current way of >> implementing >> barriers in libgomp on nvptx created a quite significant perfo

Re: [PATCH] RISC-V: Change constexpr back to CONSTEXPR

2022-10-31 Thread Kito Cheng via Gcc-patches
Committed, thanks! On Fri, Oct 28, 2022 at 6:47 AM Jeff Law via Gcc-patches wrote: > > > On 10/27/22 08:41, juzhe.zh...@rivai.ai wrote: > > From: Ju-Zhe Zhong > > > > According to > > https://github.com/gcc-mirror/gcc/commit/f95d3d5de72a1c43e8d529bad3ef59afc3214705. > > Since GCC 4.8.6 doesn't

[committed] amdgcn: multi-size vector reductions

2022-10-31 Thread Andrew Stubbs
My recent patch to add additional vector lengths didn't address the vector reductions yet. This patch adds the missing support. Shorter vectors use fewer reduction steps, and the means to extract the final value has been adjusted. Lacking from this is any useful costs, so for loops the vect p

[committed] amdgcn: add fmin/fmax patterns

2022-10-31 Thread Andrew Stubbs
This patch adds patterns for the fmin and fmax operators, for scalars, vectors, and vector reductions. The compiler uses smin and smax for most floating-point optimizations, etc., but not where the user calls fmin/fmax explicitly. On amdgcn the hardware min/max instructions are already IEEE c

[committed] amdgcn: Silence unused parameter warning

2022-10-31 Thread Andrew Stubbs
A function parameter was left over from a previous draft of my multiple-vector-length patch. This patch silences the harmless warning. Andrewamdgcn: Silence unused parameter warning gcc/ChangeLog: * config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): Set base_type a

RE: [GCC][PATCH v2] arm: Add cde feature support for Cortex-M55 CPU.

2022-10-31 Thread Srinath Parvathaneni via Gcc-patches
Hi, > -Original Message- > From: Christophe Lyon > Sent: Monday, October 17, 2022 2:30 PM > To: Srinath Parvathaneni ; gcc- > patc...@gcc.gnu.org > Cc: Richard Earnshaw > Subject: Re: [GCC][PATCH] arm: Add cde feature support for Cortex-M55 > CPU. > > Hi Srinath, > > > On 10/10/22 10:

Re: [PATCH]AArch64 Extend umov and sbfx patterns.

2022-10-31 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > Our zero and sign extend and extract patterns are currently very limited and > only work for the original register size of the instructions. i.e. limited by > GPI patterns. However these instructions extract bits and extend. This means > that any register si

[PATCH 8/8]AArch64: Have reload not choose to do add on the scalar side if both values exist on the SIMD side.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, Currently we often times generate an r -> r add even if it means we need two reloads to perform it, i.e. in the case that the values are on the SIMD side. The pairwise operations expose these more now and so we get suboptimal codegen. Normally I would have liked to use ^ or $ here, but w

[PATCH 7/8]AArch64: Consolidate zero and sign extension patterns and add missing ones.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, The target has various zero and sign extension patterns. These however live in various locations around the MD file and almost all of them are split differently. Due to the various patterns we also ended up missing valid extensions. For instance smov is almost never generated. This cha

[PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, The backend has an existing V2HFmode that is used by pairwise operations. This mode was however never made fully functional. Amongst other things it was never declared as a vector type which made it unusable from the mid-end. It's also lacking an implementation for load/stores so reload

[PATCH 6/8]AArch64: Add peephole and scheduling logic for pairwise operations that appear late in RTL.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, Says what it does on the tin. In case some operations form in RTL due to a split, combine or any RTL pass then still try to recognize them. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sim

[PATCH 4/8]AArch64 aarch64: Implement widening reduction patterns

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, This implements the new widening reduction optab in the backend. Instead of introducing a duplicate definition for the same thing I have renamed the intrinsics defintions to use the same optab. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar

[PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, This patch series is to add recognition of pairwise operations (reductions) in match.pd such that we can benefit from them even at -O1 when the vectorizer isn't enabled. Ths use of these allow for a lot simpler codegen in AArch64 and allows us to avoid quite a lot of codegen warts. As an

[PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, The current vector extract pattern can only extract from a vector when the position to extract is a multiple of the vector bitsize as a whole. That means extract something like a V2SI from a V4SI vector from position 32 isn't possible as 32 is not a multiple of 64. Ideally this optab sho

[PATCH 2/8]middle-end: Recognize scalar widening reductions

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, This adds a new optab and IFNs for REDUC_PLUS_WIDEN where the resulting scalar reduction has twice the precision of the input elements. At some point in a later patch I will also teach the vectorizer to recognize this builtin once I figure out how the various bits of reductions work. For

[PATCH]AArch64 Extend umov and sbfx patterns.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, Our zero and sign extend and extract patterns are currently very limited and only work for the original register size of the instructions. i.e. limited by GPI patterns. However these instructions extract bits and extend. This means that any register size can be used as an input as long a

[PATCH 2/2]AArch64 Support new tbranch optab.

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, This implements the new tbranch optab for AArch64. Instead of emitting the instruction directly I've chosen to expand the pattern using a zero extract and generating the existing pattern for comparisons for two reasons: 1. Allows for CSE of the actual comparison. 2. It looks like the

[PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-10-31 Thread Tamar Christina via Gcc-patches
Hi All, This adds a new test-and-branch optab that can be used to do a conditional test of a bit and branch. This is similar to the cbranch optab but instead can test any arbitrary bit inside the register. This patch recognizes boolean comparisons and single bit mask tests. Bootstrapped Regtes

  1   2   >