Re: [PATCH] [i386]Fix tdpbf16ps testcase

2021-12-27 Thread Hongtao Liu via Gcc-patches
On Fri, Dec 24, 2021 at 4:51 PM Haochen Jiang via Gcc-patches wrote: > > Hi all, > > This patch fix the testcase of amxbf16-dpbf16ps-2.c. Previously the type > convert has some issue. > > Ok for trunk? Ok. > > BRs, > Haochen > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/amx-check.h (

[PATCH] PR fortran/102332 - ICE in select_type_set_tmp, at fortran/match.c:6366

2021-12-27 Thread Harald Anlauf via Gcc-patches
Dear all, there are a couple of NULL pointer dereferences leading to improper error recovery when trying to handle Gerhard's testcases involving SELECT TYPE and invalid uses of CLASS variables. The fixes look pretty obvious to me, but I'm submitting here to check if there is more that should be d

Re: [PATCH] Make integer output faster in libgfortran

2021-12-27 Thread FX via Gcc-patches
Follow-up patch committed, after my use of the one-argument variant of static_assert() broke bootstrap on Solaris (sorry Rainer!). The one-arg form is new since C23, while Solaris only supports the two-arg form (C11). I have confirmed that other target libraries use the two-arg form, and boots

[PATCH v6 34/34] Add -mpure-code support to the CM0 functions.

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-16 Daniel Engel Makefile.in (MPURE_CODE): New macro defines __PURE_CODE__. (gcc_compile): Appended MPURE_CODE. lib1funcs.S (FUNC_START_SECTION): Set flags for __PURE_CODE__. clz2.S (__clzsi2): Added -mpure-code compatible instructions.

[PATCH v6 33/34] Drop single-precision Thumb-1 soft-float functions

2021-12-27 Thread Daniel Engel
With the complete CM0 library integrated, regression testing showed new failures with the message "compilation failed to produce executable": gcc.dg/fixed-point/convert-float-1.c gcc.dg/fixed-point/convert-float-3.c gcc.dg/fixed-point/convert-sat.c Investigating, this appears to be ca

[PATCH v6 32/34] Import float<->__fp16 conversion from the CM0 library

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/eabi/fcast.S (__aeabi_h2f, __aeabi_f2h): Added functions. * config/arm/fp16 (__gnu_f2h_ieee, __gnu_h2f_ieee, __gnu_f2h_alternative, __gnu_h2f_alternative): Disable build for v6m multilibs. * config/arm/t-b

[PATCH v6 31/34] Import float<->double conversion from the CM0 library

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/eabi/fcast.S (__aeabi_d2f, __aeabi_f2d): New file. * config/arm/lib1funcs.S: #include eabi/fcast.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Added _arm_d2f and _arm_f2d. --- libgcc/config/arm/eabi/fcast.S | 2

[PATCH v6 30/34] Import float-to-integer conversion from the CM0 library

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/bpabi-lib.h (muldi3): Removed duplicate. (fixunssfsi) Removed obsolete RENAME_LIBRARY directive. * config/arm/eabi/ffixed.S (__aeabi_f2iz, __aeabi_f2uiz, __aeabi_f2lz, __aeabi_f2ulz): New file. * co

[PATCH v6 29/34] Import integer-to-float conversion from the CM0 library

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/bpabi-lib.h (__floatdisf, __floatundisf): Remove obsolete RENAME_LIBRARY directives. * config/arm/eabi/ffloat.S (__aeabi_i2f, __aeabi_l2f, __aeabi_ui2f, __aeabi_ul2f): New file. * config/arm/lib1fun

[PATCH v6 28/34] Import float division from the CM0 library

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-08 Daniel Engel * config/arm/eabi/fdiv.S (__divsf3, __fp_divloopf): New file. * config/arm/lib1funcs.S: #include eabi/fdiv.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Added _divsf3 and _fp_divloopf. --- libgcc/config/arm/eabi/fdiv.S | 26

[PATCH v6 27/34] Import float multiplication from the CM0 library

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/eabi/fmul.S (__mulsf3): New file. * config/arm/lib1funcs.S: #include eabi/fmul.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Moved _mulsf3 to global scope (this object was previously blocked on v6m build

[PATCH v6 26/34] Import float addition and subtraction from the CM0 library

2021-12-27 Thread Daniel Engel
Since this is the first import of single-precision functions, some common parsing and formatting routines are also included. These common rotines will be referenced by other functions in subsequent commits. However, even if the size penalty is accounted entirely to __addsf3(), the total compiled s

[PATCH v6 25/34] Refactor Thumb-1 float subtraction into a new file

2021-12-27 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/bpabi-v6m.S (__aeabi_frsub): Moved to ... * config/arm/eabi/fadd.S: New file. * config/arm/lib1funcs.S: #include eabi/fadd.S (v6m only). --- libg

[PATCH v6 24/34] Import float comparison from the CM0 library

2021-12-27 Thread Daniel Engel
These functions are significantly smaller and faster than the wrapper functions and soft-float implementation they replace. Using the first comparison operator (e.g. '<=') in any program costs about 70 bytes initially, but every additional operator incrementally adds just 4 bytes. NOTE: It seems

[PATCH v6 23/34] Refactor Thumb-1 float comparison into a new file

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/bpabi-v6m.S (__aeabi_cfcmpeq, __aeabi_cfcmple, __aeabi_cfrcmple, __aeabi_fcmpeq, __aeabi_fcmple, aeabi_fcmple, __aeabi_fcmpgt, aeabi_fcmpge): Moved to ... * config/arm/eabi/fcmp.S: New file. * confi

[PATCH v6 22/34] Import integer multiplication from the CM0 library

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-07 Daniel Engel * config/arm/eabi/lmul.S: New file for __muldi3(), __mulsidi3(), and __umulsidi3(). * config/arm/lib1funcs.S: #eabi/lmul.S (v6m only). * config/arm/t-elf: Add the new objects to LIB1ASMFUNCS. --- libgcc/config/arm/eab

[PATCH v6 21/34] Import 64-bit division from the CM0 library

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/bpabi.c: Deleted unused file. * config/arm/eabi/ldiv.S (__aeabi_ldivmod, __aeabi_uldivmod): Replaced wrapper functions with a complete implementation. * config/arm/t-bpabi (LIB2ADD_ST): Removed bpabi.c.

[PATCH v6 20/34] Refactor Thumb-1 64-bit division into a new file

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/bpabi-v6m.S (__aeabi_ldivmod/ldivmod): Moved to ... * config/arm/eabi/ldiv.S: New file. * config/arm/lib1funcs.S: #include eabi/ldiv.S (v6m only). --- libgcc/config/arm/bpabi-v6m.S | 81 -

[PATCH v6 19/34] Import 32-bit division from the CM0 library

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-07 Daniel Engel * config/arm/eabi/idiv.S: New file for __udivsi3() and __divsi3(). * config/arm/lib1funcs.S: #include eabi/idiv.S (v6m only). --- libgcc/config/arm/eabi/idiv.S | 299 ++ libgcc/config/arm/lib1funcs.S |

[PATCH v6 18/34] Merge Thumb-2 optimizations for 64-bit comparison

2021-12-27 Thread Daniel Engel
This effectively merges support for all architecture variants into a common function path with appropriate build conditions. ARM performance is 1-2 instructions faster; Thumb-2 is about 50% faster. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/bpabi.S (__aeabi_lcmp, __aeabi_

[PATCH v6 17/34] Import 64-bit comparison from CM0 library

2021-12-27 Thread Daniel Engel
These are 2-5 instructions smaller and just as fast. Branches are minimized, which will allow easier adaptation to Thumb-2/ARM mode. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/eabi/lcmp.S (__aeabi_lcmp, __aeabi_ulcmp): Replaced; add macro configuration to build _

[PATCH v6 16/34] Refactor Thumb-1 64-bit comparison into a new file

2021-12-27 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/bpabi-v6m.S (__aeabi_lcmp, __aeabi_ulcmp): Moved to ... * config/arm/eabi/lcmp.S: New file. * config/arm/lib1funcs.S: #include eabi/lcmp.S. --- l

[PATCH v6 15/34] Import 'popcnt' functions from the CM0 library

2021-12-27 Thread Daniel Engel
The functional overlap between the single- and double-word functions makes this implementation about 30% smaller than the C functions if both functions are linked together in the same appliation. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/popcnt.S (__popcountsi, __popcoun

[PATCH v6 14/34] Import 'parity' functions from the CM0 library

2021-12-27 Thread Daniel Engel
The functional overlap between the single- and double-word functions makes functions makes this implementation about half the size of the C functions if both functions are linked in the same application. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/parity.S: New file for __

[PATCH v6 13/34] Import 'ffs' functions from the CM0 library

2021-12-27 Thread Daniel Engel
This implementation provides an efficient tail call to __clzdi2(), making the functions rather smaller and faster than the C versions. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/bits/ctz2.S (__ffssi2, __ffsdi2): New functions. * config/arm/t-elf (LIB1ASMFUNCS): Ad

[PATCH v6 12/34] Import 'clrsb' functions from the CM0 library

2021-12-27 Thread Daniel Engel
This implementation provides an efficient tail call to __clzsi2(), making the functions rather smaller and faster than the C versions. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/bits/clz2.S (__clrsbsi2, __clrsbdi2): Added new functions. * config/arm/t-elf

[PATCH v6 11/34] Import 64-bit shift functions from the CM0 library

2021-12-27 Thread Daniel Engel
The Thumb versions of these functions are each 1-2 instructions smaller and faster, and branchless when the IT instruction is available. The ARM versions were converted to the "xxl/xxh" big-endian register naming convention, but are otherwise unchanged. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Eng

[PATCH v6 10/34] Import 'ctz' functions from the CM0 library

2021-12-27 Thread Daniel Engel
This version combines __ctzdi2() with __ctzsi2() into a single object with an efficient tail call. The former implementation of __ctzdi2() was in C. On architectures without __ARM_FEATURE_CLZ, this version merges the formerly separate Thumb and ARM code sequences into a unified instruction sequen

[PATCH v6 09/34] Import 'clz' functions from the CM0 library

2021-12-27 Thread Daniel Engel
On architectures without __ARM_FEATURE_CLZ, this version combines __clzdi2() with __clzsi2() into a single object with an efficient tail call. Also, this version merges the formerly separate Thumb and ARM code implementations into a unified instruction sequence. This change significantly improves

[PATCH v6 08/34] Refactor 64-bit shift functions into a new file

2021-12-27 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/lib1funcs.S (__ashldi3, __ashrdi3, __lshldi3): Moved to ... * config/arm/eabi/lshift.S: New file. --- libgcc/config/arm/eabi/lshift.S | 123 +

[PATCH v6 07/34] Refactor 'ctz' functions into a new file

2021-12-27 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/lib1funcs.S (__ctzsi2): Moved to ... * config/arm/ctz2.S: New file. --- libgcc/config/arm/ctz2.S | 86 +++ libgcc/co

[PATCH v6 06/34] Refactor 'clz' functions into a new file

2021-12-27 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/lib1funcs.S (__clzsi2i, __clzdi2): Moved to ... * config/arm/clz2.S: New file. --- libgcc/config/arm/clz2.S | 145 ++

[PATCH v6 05/34] Add the __HAVE_FEATURE_IT and IT() macros

2021-12-27 Thread Daniel Engel
These macros complement and extend the existing do_it() macro. Together, they streamline the process of optimizing short branchless contitional sequences to support ARM, Thumb-2, and Thumb-1. The inherent architecture limitations of Thumb-1 means that writing assembly code is somewhat more tedious

[PATCH v6 04/34] Reorganize LIB1ASMFUNCS object wrapper macros

2021-12-27 Thread Daniel Engel
This will make it easier to isolate changes in subsequent patches. gcc/libgcc/ChangeLog: 2021-01-14 Daniel Engel * config/arm/t-elf (LIB1ASMFUNCS): Split macros into logical groups. --- libgcc/config/arm/t-elf | 66 + 1 file changed, 53 insertions

[PATCH v6 03/34] Fix syntax warnings on conditional instructions

2021-12-27 Thread Daniel Engel
gcc/libgcc/ChangeLog: 2021-01-14 Daniel Engel * config/arm/lib1funcs.S (RETLDM, ARM_DIV_BODY, ARM_MOD_BODY, _interwork_call_via_lr): Moved condition code after the flags update specifier "s". (ARM_FUNC_START, THUMB_LDIV0): Removed redundant ".syntax". --- libgcc/c

[PATCH v6 02/34] Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY

2021-12-27 Thread Daniel Engel
Since THUMB_FUNC_START does not insert the ".text" directive, it aligns more closely with the new FUNC_ENTRY maro and is renamed accordingly. THUMB_FUNC_START usage has been universally synonymous with the ".force_thumb" directive, so this is now folded into the definition. Usage of ".force_thumb"

[PATCH v6 01/34] Add and restructure function declaration macros

2021-12-27 Thread Daniel Engel
Most of these changes support subsequent patches in this series. Particularly, the FUNC_START macro becomes part of a new macro chain: * FUNC_ENTRY Common global symbol directives * FUNC_START_SECTION FUNC_ENTRY to start a new * FUNC_START FUNC_START_SECTION <

[PATCH v6 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0

2021-12-27 Thread Daniel Engel
Hi Richard, I am re-submitting my libgcc patch from last year: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html I clearly missed the stage1 window again. However, since the patch rebased cleanly onto gcc-12 with no regressions, and it's not quite stage4 yet, I figured

[committed] hppa: Improve atomic store implementation on hppa-linux

2021-12-27 Thread John David Anglin
Atomic stores on hppa-linux must be synthesized using the kernel light-weight system calls. Instead of using a compare and swap loop, it is more efficient to use the __sync_lock_test_and_set routines in libgcc. Tested on hppa-unknown-linux-gnu. Committed to trunk and gcc-11. Dave --- Improve ato

[committed] aarch64: Fix mismatched extern "C" block [PR100985]

2021-12-27 Thread Jonathan Wakely via Gcc-patches
Untested, committed as obvious, fixing the 9.4.0 regression introduced by r9-8936. gcc/ChangeLog: PR target/100985 * config/aarch64/arm_acle.h: Remove unclosed extern "C" block. --- gcc/config/aarch64/arm_acle.h | 4 1 file changed, 4 deletions(-) diff --git a/gcc/config/a

Re: [2/2] PR96463 -- changes to type checking vec_perm_expr in middle end

2021-12-27 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 17 Dec 2021 at 16:37, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > Hi, > > The attached patch rearranges order of type-check for vec_perm_expr > > and relaxes type checking for > > lhs = vec_perm_expr > > > > when: > > rhs1 == rhs2, > > lhs is variable length vector, > > r

Re: [1/2] PR96463 - aarch64 specific changes

2021-12-27 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 17 Dec 2021 at 17:03, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > Hi, > > The patch folds: > > lhs = svld1rq ({-1, -1, -1, ...}, &v[0]) > > into: > > lhs = vec_perm_expr > > and expands above vec_perm_expr using aarch64_expand_sve_dupq. > > > > With patch, for following t