from:"Daniel Engel"

Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

2021-01-21 Thread Daniel Engel

Hi Christophe,

On Thu, Jan 21, 2021, at 2:29 AM, Christophe Lyon wrote:
> On Sat, 16 Jan 2021 at 17:13, Daniel Engel  wrote:
> >
> > Hi Christophe,
> >
> > On Fri, Jan 15, 2021, at 4:30 AM, Christophe Lyon wrote:
> > > On Fri, 15 Jan 2021 at 12:39, Daniel Engel  wrote:
> > > >
> > > > Hi Christophe,
> > > >
> > > > On Mon, Jan 11, 2021, at 8:39 AM, Christophe Lyon wrote:
> > > > > On Mon, 11 Jan 2021 at 17:18, Daniel Engel  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jan 11, 2021, at 8:07 AM, Christophe Lyon wrote:
> > > > > > > On Sat, 9 Jan 2021 at 14:09, Christophe Lyon 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Sat, 9 Jan 2021 at 13:27, Daniel Engel 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Jan 7, 2021, at 4:56 AM, Richard Earnshaw wrote:
> > > > > > > > > > On 07/01/2021 00:59, Daniel Engel wrote:
> > > > > > > > > > > --snip--
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote:
> > > > > > > > > > > --snip--
> > > > > > > > > > >
> > > > > > > > > > >> - finally, your popcount implementations have data in 
> > > > > > > > > > >> the code segment.
> > > > > > > > > > >>  That's going to cause problems when we have compilation 
> > > > > > > > > > >> options such as
> > > > > > > > > > >> -mpure-code.
> > > > > > > > > > >
> > > > > > > > > > > I am just following the precedent of existing lib1funcs 
> > > > > > > > > > > (e.g. __clz2si).
> > > > > > > > > > > If this matters, you'll need to point in the right 
> > > > > > > > > > > direction for the
> > > > > > > > > > > fix.  I'm not sure it does matter, since these functions 
> > > > > > > > > > > are PIC anyway.
> > > > > > > > > >
> > > > > > > > > > That might be a bug in the clz implementations - 
> > > > > > > > > > Christophe: Any thoughts?
> > > > > > > > >
> > > > > > > > > __clzsi2() has test coverage in 
> > > > > > > > > "gcc.c-torture/execute/builtin-bitops-1.c"
> > > > > > > > Thanks, I'll have a closer look at why I didn't see problems.
> > > > > > > >
> > > > > > >
> > > > > > > So, that's because the code goes to the .text section (as opposed 
> > > > > > > to
> > > > > > > .text.noread)
> > > > > > > and does not have the PURECODE flag. The compiler takes care of 
> > > > > > > this
> > > > > > > when generating code with -mpure-code.
> > > > > > > And the simulator does not complain because it only checks loads 
> > > > > > > from
> > > > > > > the segment with the PURECODE flag set.
> > > > > > >
> > > > > > This is far out of my depth, but can something like:
> > > > > >
> > > > > > ifeq (,$(findstring __symbian__,$(shell $(gcc_compile_bare) -dM -E 
> > > > > > -  > > > > >
> > > > > > be adapted to:
> > > > > >
> > > > > > a) detect the state of the -mpure-code switch, and
> > > > > > b) pass that flag to the preprocessor?
> > > > > >
> > > > > > If so, I can probably fix both the target section and the data 
> > > > > > usage.
> > > > > > Just have to add a few instructions to finish unrolling the loop.
> > > > >
> > > > > I must confess I never checked libgcc's Makefile deeply before,
> > > > > but it looks like you can probably detect whether -mpure-code is
> > > > > part of $CFLAGS.
> > > > >
> > > > > However, it mi

Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

2021-01-25 Thread Daniel Engel

> > > --snip--
> > >
> > > If the test server farm is free at some point, would you mind running
> > > another set of regression tests on my v5 patch series?
> >
> > Sure. Given the number of sub-patches, can you send it to me as a
> > single patch file
> > (git format) that I can directly apply to GCC trunk?
> > My mailer does not want to help with saving each patch as a proper
> > patch file :-(
> >
> 
> The validation results came back clean (no regression found).
> Thanks
> 
> Christophe

Appreciate the update.  Seems that the linker "bug" really was all that
I was fighting there at the end (see patch number 33/33).

I did see the announcement for stage 4 last week, so I think this is all
I can do for now.  With luck I will be back in October or so.

Thanks again,
Daniel

> 
> > Thanks
> >
> > Christophe
> >
> > >
> > > Regards,
> > > Daniel
>

[PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0

2022-10-31 Thread Daniel Engel

 lsb

__shared_float  178
__shared_float (OPTIMIZE_SIZE)  154

__addsf3 (__aeabi_fadd) 116+__shared_float  31..76  8   
<= 0.5 ulp
__addsf3 (OPTIMIZE_SIZE)112+__shared_float  74  8   
<= 0.5 ulp
__subsf3 (__aeabi_fsub) 6+__addsf3  3+__addsf3  8   
<= 0.5 ulp
__aeabi_frsub   8+__addsf3  6+__addsf3  8   
<= 0.5 ulp
__mulsf3 (__aeabi_fmul) 112+__shared_float  73..97  8   
<= 0.5 ulp
__mulsf3 (OPTIMIZE_SIZE)96+__shared_float   93  8   
<= 0.5 ulp
__divsf3 (__aeabi_fdiv) 132+__shared_float  83..361 8   
<= 0.5 ulp
__divsf3 (OPTIMIZE_SIZE)120+__shared_float  263..3598   
<= 0.5 ulp

__cmpsf2/__lesf2/__ltsf272  33  0   
exact
__eqsf2/__nesf2 4+__cmpsf2  3+__cmpsf2  0   
exact
__gesf2/__gesf2 4+__cmpsf2  3+__cmpsf2  0   
exact
__unordsf2 (__aeabi_fcmpun) 4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmpeq  4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmpne  4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmplt  4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmple  4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmpge  4+__cmpsf2  3+__cmpsf2  0   
exact

__floatundisf (__aeabi_ul2f)14+__shared_float   40..81  8   
<= 0.5 ulp
__floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40..237 8   
<= 0.5 ulp
__floatunsisf (__aeabi_ui2f)0+__floatundisf 1+__floatundisf 8   
<= 0.5 ulp
__floatdisf (__aeabi_l2f)   14+__floatundisf7+__floatundisf 8   
<= 0.5 ulp
__floatsisf (__aeabi_i2f)   0+__floatdisf   1+__floatdisf   8   
<= 0.5 ulp

__fixsfdi (__aeabi_f2lz)74  27..33  0   
exact
__fixunssfdi (__aeabi_f2ulz)4+__fixsfdi 3+__fixsfdi 0   
exact
__fixsfsi (__aeabi_f2iz)52  19  0   
exact
__fixsfsi (OPTIMIZE_SIZE)   4+__fixsfdi 3+__fixsfdi 0   
exact
__fixunssfsi (__aeabi_f2uiz)4+__fixsfsi 3+__fixsfsi 0   
exact

__extendsfdf2 (__aeabi_f2d) 42+__shared_float   38  8   
exact
__truncsfdf2 (__aeabi_f2d)  88  34  8   
exact
__aeabi_d2f 56+__shared_float   54..58  8   
<= 0.5 ulp
__aeabi_h2f 34+__shared_float   34  8   
exact
__aeabi_f2h 84  23..34  0   
<= 0.5 ulp

Copyright assignment is on file with the FSF.

Thanks,
Daniel Engel


[1] // Test program for size comparison

extern int main (void)
{
volatile int x = 1;
volatile unsigned long long int y = 10;
volatile long long int z = x / y; // 64-bit division

volatile float a = x; // 32-bit casting
volatile float b = y; // 64 bit casting
volatile float c = z / b; // float division
volatile float d = a + c; // float addition
volatile float e = c * b; // float multiplication
volatile float f = d - e - c; // float subtraction

if (f != c) // float comparison
y -= (long long int)d; // float casting
}

[2] http://danielengel.com/cm0_test_vectors.tgz
[3] http://www.netlib.org/fp/ucbtest.tgz
[4] http://www.jhauser.us/arithmetic/TestFloat.html
[5] http://win-www.uia.ac.be/u/cant/ieeecc754.html

---

Daniel Engel (34):
  Add and restructure function declaration macros
  Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY
  Fix syntax warnings on conditional instructions
  Reorganize LIB1ASMFUNCS object wrapper macros
  Add the __HAVE_FEATURE_IT and IT() macros
  Refactor 'clz' functions into a new file
  Refactor 'ctz' functions into a new file
  Refactor 64-bit shift functions into a new file
  Import 'clz' functions from the CM0 library
  Import 'ctz' functions from the CM0 library
  Import 64-bit shift functions from the CM0 library
  Import 'clrsb' functions from the CM0 library
  Import 'ffs' functions from the CM0 library
  Import 'parity' functions from the CM0 library
  Import 'popcnt' functions from the CM0 library
  Refactor Thumb-1 64-bit comparison into a new file
  Import 64-bit comparison from CM0 library
  Merge Thumb-2 optimizations for 64-bit comparison
  Import 32-bit division from the CM0 library
  Refactor Thumb-1 64-bit division into a new file
  Import 64-bit division from the CM0 library
  Import integer multipli

[PATCH v7 01/34] Add and restructure function declaration macros

2022-10-31 Thread Daniel Engel

Most of these changes support subsequent patches in this series.
Particularly, the FUNC_START macro becomes part of a new macro chain:

  * FUNC_ENTRY  Common global symbol directives
  * FUNC_START_SECTION  FUNC_ENTRY to start a new 
  * FUNC_START  FUNC_START_SECTION <".text">

The effective definition of FUNC_START is unchanged from the previous
version of lib1funcs.  See code comments for detailed usage.

The new names FUNC_ENTRY and FUNC_START_SECTION were chosen specifically
to complement the existing FUNC_START name.  Alternate name patterns are
possible (such as {FUNC_SYMBOL, FUNC_START_SECTION, FUNC_START_TEXT}),
but any change to FUNC_START would require refactoring much of libgcc.

Additionally, a parallel chain of new macros supports weak functions:

  * WEAK_ENTRY
  * WEAK_START_SECTION
  * WEAK_START
  * WEAK_ALIAS

Moving the CFI_* macros earlier in the file scope will increase their
scope for use in additional functions.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/lib1funcs.S:
(LLSYM): New macro prefix ".L" for strippable local symbols.
(CFI_START_FUNCTION, CFI_END_FUNCTION): Moved earlier in the file.
(FUNC_ENTRY): New macro for symbols with no ".section" directive.
(WEAK_ENTRY): New macro FUNC_ENTRY + ".weak".
(FUNC_START_SECTION): New macro FUNC_ENTRY with  argument.
(WEAK_START_SECTION): New macro FUNC_START_SECTION + ".weak".
(FUNC_START): Redefined in terms of FUNC_START_SECTION <".text">.
(WEAK_START): New macro FUNC_START + ".weak".
(WEAK_ALIAS): New macro FUNC_ALIAS + ".weak".
(FUNC_END): Moved after FUNC_START macro group.
(THUMB_FUNC_START): Moved near the other *FUNC* macros.
(THUMB_SYNTAX, ARM_SYM_START, SYM_END): Deleted unused macros.
---
 libgcc/config/arm/lib1funcs.S | 109 +-
 1 file changed, 69 insertions(+), 40 deletions(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 8c39c9f20a2..a4fa62b3832 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -69,11 +69,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define TYPE(x) .type SYM(x),function
 #define SIZE(x) .size SYM(x), . - SYM(x)
 #define LSYM(x) .x
+#define LLSYM(x) .L##x
 #else
 #define __PLT__
 #define TYPE(x)
 #define SIZE(x)
 #define LSYM(x) x
+#define LLSYM(x) x
 #endif
 
 /* Function end macros.  Variants for interworking.  */
@@ -182,6 +184,16 @@ LSYM(Lend_fde):
 #endif
 .endm
 
+.macro CFI_START_FUNCTION
+   .cfi_startproc
+   .cfi_remember_state
+.endm
+
+.macro CFI_END_FUNCTION
+   .cfi_restore_state
+   .cfi_endproc
+.endm
+
 /* Don't pass dirn, it's there just to get token pasting right.  */
 
 .macro RETLDM  regs=, cond=, unwind=, dirn=ia
@@ -324,10 +336,6 @@ LSYM(Lend_fde):
 .endm
 #endif
 
-.macro FUNC_END name
-   SIZE (__\name)
-.endm
-
 .macro DIV_FUNC_END name signed
cfi_start   __\name, LSYM(Lend_div0)
 LSYM(Ldiv0):
@@ -340,48 +348,76 @@ LSYM(Ldiv0):
FUNC_END \name
 .endm
 
-.macro THUMB_FUNC_START name
-   .globl  SYM (\name)
-   TYPE(\name)
-   .thumb_func
-SYM (\name):
-.endm
-
 /* Function start macros.  Variants for ARM and Thumb.  */
 
 #ifdef __thumb__
 #define THUMB_FUNC .thumb_func
 #define THUMB_CODE .force_thumb
-# if defined(__thumb2__)
-#define THUMB_SYNTAX
-# else
-#define THUMB_SYNTAX
-# endif
 #else
 #define THUMB_FUNC
 #define THUMB_CODE
-#define THUMB_SYNTAX
 #endif
 
+.macro THUMB_FUNC_START name
+   .globl  SYM (\name)
+   TYPE(\name)
+   .thumb_func
+SYM (\name):
+.endm
+
+/* Strong global symbol, ".text" section.
+   The default macro for function declarations. */
 .macro FUNC_START name
-   .text
+   FUNC_START_SECTION \name .text
+.endm
+
+/* Weak global symbol, ".text" section.
+   Use WEAK_* macros to declare a function/object that may be discarded in by
+the linker when another library or object exports the same name.
+   Typically, functions declared with WEAK_* macros implement a subset of
+functionality provided by the overriding definition, and are discarded
+when the full functionality is required. */
+.macro WEAK_START name
+   .weak SYM(__\name)
+   FUNC_START_SECTION \name .text
+.endm
+
+/* Strong global symbol, alternate section.
+   Use the *_START_SECTION macros for declarations that the linker should
+place in a non-defailt section (e.g. ".rodata", ".text.subsection"). */
+.macro FUNC_START_SECTION name section
+   .section \section,"x"
+   .align 0
+   FUNC_ENTRY \name
+.endm
+
+/* Weak global symbol, alternate section. */
+.macro WEAK_START_SECTION name section
+   .weak SYM(__\name)
+   FUNC_START_SECTION \name

[PATCH v7 04/34] Reorganize LIB1ASMFUNCS object wrapper macros

2022-10-31 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/t-elf (LIB1ASMFUNCS): Split macros into logical groups.
---
 libgcc/config/arm/t-elf | 66 +
 1 file changed, 53 insertions(+), 13 deletions(-)

diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 9da6cd37054..93ea1cd8f76 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -14,19 +14,59 @@ LIB1ASMFUNCS += _arm_muldf3 _arm_mulsf3
 endif
 endif # !__symbian__
 
-# For most CPUs we have an assembly soft-float implementations.
-# However this is not true for ARMv6M.  Here we want to use the soft-fp C
-# implementation.  The soft-fp code is only build for ARMv6M.  This pulls
-# in the asm implementation for other CPUs.
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func \
-   _call_via_rX _interwork_call_via_rX \
-   _lshrdi3 _ashrdi3 _ashldi3 \
-   _arm_negdf2 _arm_addsubdf3 _arm_muldivdf3 _arm_cmpdf2 _arm_unorddf2 \
-   _arm_fixdfsi _arm_fixunsdfsi \
-   _arm_truncdfsf2 _arm_negsf2 _arm_addsubsf3 _arm_muldivsf3 \
-   _arm_cmpsf2 _arm_unordsf2 _arm_fixsfsi _arm_fixunssfsi \
-   _arm_floatdidf _arm_floatdisf _arm_floatundidf _arm_floatundisf \
-   _clzsi2 _clzdi2 _ctzsi2
+# This pulls in the available assembly function implementations.
+# The soft-fp code is only built for ARMv6M, since there is no
+# assembly implementation here for double-precision values.
+
+
+# Group 1: Integer function objects.
+LIB1ASMFUNCS += \
+   _ashldi3 \
+   _ashrdi3 \
+   _lshrdi3 \
+   _clzdi2 \
+   _clzsi2 \
+   _ctzsi2 \
+   _dvmd_tls \
+   _divsi3 \
+   _modsi3 \
+   _udivsi3 \
+   _umodsi3 \
+
+
+# Group 2: Single precision floating point function objects.
+LIB1ASMFUNCS += \
+   _arm_addsubsf3 \
+   _arm_cmpsf2 \
+   _arm_fixsfsi \
+   _arm_fixunssfsi \
+   _arm_floatdisf \
+   _arm_floatundisf \
+   _arm_muldivsf3 \
+   _arm_negsf2 \
+   _arm_unordsf2 \
+
+
+# Group 3: Double precision floating point function objects.
+LIB1ASMFUNCS += \
+   _arm_addsubdf3 \
+   _arm_cmpdf2 \
+   _arm_fixdfsi \
+   _arm_fixunsdfsi \
+   _arm_floatdidf \
+   _arm_floatundidf \
+   _arm_muldivdf3 \
+   _arm_negdf2 \
+   _arm_truncdfsf2 \
+   _arm_unorddf2 \
+
+
+# Group 4: Miscellaneous function objects.
+LIB1ASMFUNCS += \
+   _bb_init_func \
+   _call_via_rX \
+   _interwork_call_via_rX \
+
 
 # Currently there is a bug somewhere in GCC's alias analysis
 # or scheduling code that is breaking _fpmul_parts in fp-bit.c.
-- 
2.34.1

[PATCH v7 07/34] Refactor 'ctz' functions into a new file

2022-10-31 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/lib1funcs.S (__ctzsi2): Moved to ...
* config/arm/ctz2.S: New file.
---
 libgcc/config/arm/ctz2.S  | 86 +++
 libgcc/config/arm/lib1funcs.S | 65 +-
 2 files changed, 87 insertions(+), 64 deletions(-)
 create mode 100644 libgcc/config/arm/ctz2.S

diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
new file mode 100644
index 000..1d885dcc71a
--- /dev/null
+++ b/libgcc/config/arm/ctz2.S
@@ -0,0 +1,86 @@
+/* Copyright (C) 1995-2022 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_ctzsi2
+#ifdef NOT_ISA_TARGET_32BIT
+FUNC_START ctzsi2
+   negsr1, r0
+   andsr0, r0, r1
+   movsr1, #28
+   movsr3, #1
+   lslsr3, r3, #16
+   cmp r0, r3 /* 0x1 */
+   bcc 2f
+   lsrsr0, r0, #16
+   subsr1, r1, #16
+2: lsrsr3, r3, #8
+   cmp r0, r3 /* #0x100 */
+   bcc 2f
+   lsrsr0, r0, #8
+   subsr1, r1, #8
+2: lsrsr3, r3, #4
+   cmp r0, r3 /* #0x10 */
+   bcc 2f
+   lsrsr0, r0, #4
+   subsr1, r1, #4
+2: adr r2, 1f
+   ldrbr0, [r2, r0]
+   subsr0, r0, r1
+   bx lr
+.align 2
+1:
+.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
+   FUNC_END ctzsi2
+#else
+ARM_FUNC_START ctzsi2
+   rsb r1, r0, #0
+   and r0, r0, r1
+# if defined (__ARM_FEATURE_CLZ)
+   clz r0, r0
+   rsb r0, r0, #31
+   RET
+# else
+   mov r1, #28
+   cmp r0, #0x1
+   do_it   cs, t
+   movcs   r0, r0, lsr #16
+   subcs   r1, r1, #16
+   cmp r0, #0x100
+   do_it   cs, t
+   movcs   r0, r0, lsr #8
+   subcs   r1, r1, #8
+   cmp r0, #0x10
+   do_it   cs, t
+   movcs   r0, r0, lsr #4
+   subcs   r1, r1, #4
+   adr r2, 1f
+   ldrbr0, [r2, r0]
+   sub r0, r0, r1
+   RET
+.align 2
+1:
+.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
+# endif /* !defined (__ARM_FEATURE_CLZ) */
+   FUNC_END ctzsi2
+#endif
+#endif /* L_clzsi2 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 469fea9ab5c..6cf7561835d 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1804,70 +1804,7 @@ LSYM(Lover12):
 #endif /* __symbian__ */
 
 #include "clz2.S"
-
-#ifdef L_ctzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START ctzsi2
-   negsr1, r0
-   andsr0, r0, r1
-   movsr1, #28
-   movsr3, #1
-   lslsr3, r3, #16
-   cmp r0, r3 /* 0x1 */
-   bcc 2f
-   lsrsr0, r0, #16
-   subsr1, r1, #16
-2: lsrsr3, r3, #8
-   cmp r0, r3 /* #0x100 */
-   bcc 2f
-   lsrsr0, r0, #8
-   subsr1, r1, #8
-2: lsrsr3, r3, #4
-   cmp r0, r3 /* #0x10 */
-   bcc 2f
-   lsrsr0, r0, #4
-   subsr1, r1, #4
-2: adr r2, 1f
-   ldrbr0, [r2, r0]
-   subsr0, r0, r1
-   bx lr
-.align 2
-1:
-.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-   FUNC_END ctzsi2
-#else
-ARM_FUNC_START ctzsi2
-   rsb r1, r0, #0
-   and r0, r0, r1
-# if defined (__ARM_FEATURE_CLZ)
-   clz r0, r0
-   rsb r0, r0, #31
-   RET
-# else
-   mov r1, #28
-   cmp r0, #0x1
-   do_it   cs, t
-   movcs   r0, r0, lsr #16
-   subcs   r1, r1, #16
-   cmp r0, #0x100
-   do_it   cs, t
-   movcs   r0, r0, lsr #8
-   subcs   r1, r1, #8
-   cmp r0, #0x10
-   do_it   cs, t
-   movcs   r0, r0, lsr #4
-   subcs   r1, r1, #4
-   adr r2, 1f
-   ldrbr0, [r2, r0]
-   sub r0, r0, r1
-   RET
-.align 2
-1:
-.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-# endif /* !defined (_

[PATCH v7 06/34] Refactor 'clz' functions into a new file

2022-10-31 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/lib1funcs.S (__clzsi2i, __clzdi2): Moved to ...
* config/arm/clz2.S: New file.
---
 libgcc/config/arm/clz2.S  | 145 ++
 libgcc/config/arm/lib1funcs.S | 123 +---
 2 files changed, 146 insertions(+), 122 deletions(-)
 create mode 100644 libgcc/config/arm/clz2.S

diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
new file mode 100644
index 000..439341752ba
--- /dev/null
+++ b/libgcc/config/arm/clz2.S
@@ -0,0 +1,145 @@
+/* Copyright (C) 1995-2022 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_clzsi2
+#ifdef NOT_ISA_TARGET_32BIT
+FUNC_START clzsi2
+   movsr1, #28
+   movsr3, #1
+   lslsr3, r3, #16
+   cmp r0, r3 /* 0x1 */
+   bcc 2f
+   lsrsr0, r0, #16
+   subsr1, r1, #16
+2: lsrsr3, r3, #8
+   cmp r0, r3 /* #0x100 */
+   bcc 2f
+   lsrsr0, r0, #8
+   subsr1, r1, #8
+2: lsrsr3, r3, #4
+   cmp r0, r3 /* #0x10 */
+   bcc 2f
+   lsrsr0, r0, #4
+   subsr1, r1, #4
+2: adr r2, 1f
+   ldrbr0, [r2, r0]
+   addsr0, r0, r1
+   bx lr
+.align 2
+1:
+.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
+   FUNC_END clzsi2
+#else
+ARM_FUNC_START clzsi2
+# if defined (__ARM_FEATURE_CLZ)
+   clz r0, r0
+   RET
+# else
+   mov r1, #28
+   cmp r0, #0x1
+   do_it   cs, t
+   movcs   r0, r0, lsr #16
+   subcs   r1, r1, #16
+   cmp r0, #0x100
+   do_it   cs, t
+   movcs   r0, r0, lsr #8
+   subcs   r1, r1, #8
+   cmp r0, #0x10
+   do_it   cs, t
+   movcs   r0, r0, lsr #4
+   subcs   r1, r1, #4
+   adr r2, 1f
+   ldrbr0, [r2, r0]
+   add r0, r0, r1
+   RET
+.align 2
+1:
+.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
+# endif /* !defined (__ARM_FEATURE_CLZ) */
+   FUNC_END clzsi2
+#endif
+#endif /* L_clzsi2 */
+
+#ifdef L_clzdi2
+#if !defined (__ARM_FEATURE_CLZ)
+
+# ifdef NOT_ISA_TARGET_32BIT
+FUNC_START clzdi2
+   push{r4, lr}
+   cmp xxh, #0
+   bne 1f
+#  ifdef __ARMEB__
+   movsr0, xxl
+   bl  __clzsi2
+   addsr0, r0, #32
+   b 2f
+1:
+   bl  __clzsi2
+#  else
+   bl  __clzsi2
+   addsr0, r0, #32
+   b 2f
+1:
+   movsr0, xxh
+   bl  __clzsi2
+#  endif
+2:
+   pop {r4, pc}
+# else /* NOT_ISA_TARGET_32BIT */
+ARM_FUNC_START clzdi2
+   do_push {r4, lr}
+   cmp xxh, #0
+   bne 1f
+#  ifdef __ARMEB__
+   mov r0, xxl
+   bl  __clzsi2
+   add r0, r0, #32
+   b 2f
+1:
+   bl  __clzsi2
+#  else
+   bl  __clzsi2
+   add r0, r0, #32
+   b 2f
+1:
+   mov r0, xxh
+   bl  __clzsi2
+#  endif
+2:
+   RETLDM  r4
+   FUNC_END clzdi2
+# endif /* NOT_ISA_TARGET_32BIT */
+
+#else /* defined (__ARM_FEATURE_CLZ) */
+
+ARM_FUNC_START clzdi2
+   cmp xxh, #0
+   do_it   eq, et
+   clzeq   r0, xxl
+   clzne   r0, xxh
+   addeq   r0, r0, #32
+   RET
+   FUNC_END clzdi2
+
+#endif
+#endif /* L_clzdi2 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 7a941ee9fc8..469fea9ab5c 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1803,128 +1803,7 @@ LSYM(Lover12):
 
 #endif /* __symbian__ */
 
-#ifdef L_clzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzsi2
-   movsr1, #28
-   movsr3, #1
-   lslsr3, r3, #16
-   cmp r0, r3 /* 0x1 */
-   bcc 2f
-   lsrsr0, r0, #16
-   subsr1, r1, #16
-2: lsrsr3, r3, #8
-   cmp r0, r3 /* #0x100 */
-   bcc 2f
-   lsrsr0, r0, #8
-   subsr1, r1, #8
-2: lsrsr3, r3, #4
-   cmp r0, r3 /* #0x10 */
-   bcc 2f
-   lsrs

[PATCH v7 02/34] Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY

2022-10-31 Thread Daniel Engel

Since THUMB_FUNC_START does not insert the ".text" directive, it aligns
more closely with the new FUNC_ENTRY maro and is renamed accordingly.

THUMB_FUNC_START usage has been universally synonymous with the
".force_thumb" directive, so this is now folded into the definition.
Usage of ".force_thumb" and ".thumb_func" is now tightly coupled
throughout the "arm" subdirectory.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/lib1funcs.S: (THUMB_FUNC_START): Renamed to ...
(THUMB_FUNC_ENTRY): for consistency; also added ".force_thumb".
(_call_via_r0): Removed redundant preceding ".force_thumb".
(__gnu_thumb1_case_sqi, __gnu_thumb1_case_uqi, __gnu_thumb1_case_shi,
__gnu_thumb1_case_si): Removed redundant ".force_thumb" and ".syntax".
---
 libgcc/config/arm/lib1funcs.S | 32 +++-
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index a4fa62b3832..726984a9d1d 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -358,10 +358,11 @@ LSYM(Ldiv0):
 #define THUMB_CODE
 #endif
 
-.macro THUMB_FUNC_START name
+.macro THUMB_FUNC_ENTRY name
.globl  SYM (\name)
TYPE(\name)
.thumb_func
+   .force_thumb
 SYM (\name):
 .endm
 
@@ -1944,10 +1945,9 @@ ARM_FUNC_START ctzsi2

.text
.align 0
-.force_thumb
 
 .macro call_via register
-   THUMB_FUNC_START _call_via_\register
+   THUMB_FUNC_ENTRY _call_via_\register
 
bx  \register
nop
@@ -2030,7 +2030,7 @@ _arm_return_r11:
 .macro interwork_with_frame frame, register, name, return
.code   16
 
-   THUMB_FUNC_START \name
+   THUMB_FUNC_ENTRY \name
 
bx  pc
nop
@@ -2047,7 +2047,7 @@ _arm_return_r11:
 .macro interwork register
.code   16
 
-   THUMB_FUNC_START _interwork_call_via_\register
+   THUMB_FUNC_ENTRY _interwork_call_via_\register
 
bx  pc
nop
@@ -2084,7 +2084,7 @@ LSYM(Lchange_\register):
/* The LR case has to be handled a little differently...  */
.code 16
 
-   THUMB_FUNC_START _interwork_call_via_lr
+   THUMB_FUNC_ENTRY _interwork_call_via_lr
 
bx  pc
nop
@@ -2112,9 +2112,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_sqi
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_sqi
push{r1}
mov r1, lr
lsrsr1, r1, #1
@@ -2131,9 +2129,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_uqi
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_uqi
push{r1}
mov r1, lr
lsrsr1, r1, #1
@@ -2150,9 +2146,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_shi
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_shi
push{r0, r1}
mov r1, lr
lsrsr1, r1, #1
@@ -2170,9 +2164,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_uhi
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_uhi
push{r0, r1}
mov r1, lr
lsrsr1, r1, #1
@@ -2190,9 +2182,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_si
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_si
push{r0, r1}
mov r1, lr
adds.n  r1, r1, #2  /* Align to word.  */
-- 
2.34.1

[PATCH v7 05/34] Add the __HAVE_FEATURE_IT and IT() macros

2022-10-31 Thread Daniel Engel

These macros complement and extend the existing do_it() macro.
Together, they streamline the process of optimizing short branchless
contitional sequences to support ARM, Thumb-2, and Thumb-1.

The inherent architecture limitations of Thumb-1 means that writing
assembly code is somewhat more tedious.  And, while such code will run
unmodified in an ARM or Thumb-2 enfironment, it will lack one of the
key performance optimizations available there.

Initially, the first idea might be to split the an instruction sequence
with #ifdef(s): one path for Thumb-1 and the other for ARM/Thumb-2.
This could suffice if conditional execution optimizations were rare.

However, #ifdef(s) break flow of an algorithm and shift focus to the
architectural differences instead of the similarities.  On functions
with a high percentage of conditional execution, it starts to become
attractive to split everything into distinct architecture-specific
function objects -- even when the underlying algorithm is identical.

Additionally, duplicated code and comments (whether an individual
operand, a line, or a larger block) become a future maintenance
liability if the two versions aren't kept in sync.

See code comments for limitations and expecated usage.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

(__HAVE_FEATURE_IT, IT): New macros.
---
 libgcc/config/arm/lib1funcs.S | 68 +++
 1 file changed, 68 insertions(+)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index f2f82f9d509..7a941ee9fc8 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -230,6 +230,7 @@ LSYM(Lend_fde):
ARM and Thumb-2.  However this is only supported by recent gas, so define
a set of macros to allow ARM code on older assemblers.  */
 #if defined(__thumb2__)
+#define __HAVE_FEATURE_IT
 .macro do_it cond, suffix=""
it\suffix   \cond
 .endm
@@ -245,6 +246,9 @@ LSYM(Lend_fde):
\name \dest, \src1, \tmp
 .endm
 #else
+#if !defined(__thumb__)
+#define __HAVE_FEATURE_IT
+#endif
 .macro do_it cond, suffix=""
 .endm
 .macro shift1 op, arg0, arg1, arg2
@@ -259,6 +263,70 @@ LSYM(Lend_fde):
 
 #define COND(op1, op2, cond) op1 ## op2 ## cond
 
+
+/* The IT() macro streamlines the construction of short branchless contitional
+sequences that support ARM, Thumb-2, and Thumb-1.  It is intended as an
+extension to the .do_it macro defined above.  Code not written with the
+intent to support Thumb-1 need not use IT().
+
+   IT()'s main advantage is the minimization of syntax differences.  Unified
+functions can support Thumb-1 without imposiing an undue performance
+penalty on ARM and Thumb-2.  Writing code without duplicate instructions
+and operands keeps the high level function flow clearer and should reduce
+the incidence of maintenance bugs.
+
+   Where conditional execution is supported by ARM and Thumb-2, the specified
+instruction compiles with the conditional suffix 'c'.
+
+   Where Thumb-1 and v6m do not support IT, the given instruction compiles
+with the standard unified syntax suffix "s", and a preceding branch
+instruction is required to implement conditional behavior.
+
+   (Aside: The Thumb-1 "s"-suffix pattern is somewhat simplistic, since it
+does not support 'cmp' or 'tst' with a non-"s" suffix.  It also appends
+"s" to 'mov' and 'add' with high register operands which are otherwise
+legal on v6m.  Use of IT() will result in a compiler error for all of
+these exceptional cases, and a full #ifdef code split will be required.
+However, it is unlikely that code written with Thumb-1 compatibility
+in mind will use such patterns, so IT() still promises a good value.)
+
+   Typical if/then/else usage is:
+
+#ifdef __HAVE_FEATURE_IT
+// ARM and Thumb-2 'true' condition.
+do_it   c,  tee
+#else
+// Thumb-1 'false' condition.  This must be opposite the
+//  sense of the ARM and Thumb-2 condition, since the
+//  branch is taken to skip the 'true' instruction block.
+b!c else_label
+#endif
+
+// Conditional 'true' execution for all compile modes.
+ IT(ins1,c) op1,op2
+ IT(ins2,c) op1,op2
+
+#ifndef __HAVE_FEATURE_IT
+// Thumb-1 branch to skip the 'else' instruction block.
+// Omitted for if/then usage.
+b   end_label
+#endif
+
+   else_label:
+// Conditional 'false' execution for all compile modes.
+// Omitted for if/then usage.
+ IT(ins3,!c) op1,   op2
+ IT(ins4,!c) op1,   op2
+
+   end_label:
+// Unconditional execution resumes here.
+ */
+#ifdef __HAVE_FEATURE_IT
+  #define IT(ins,c) ins##c
+#else
+  #define IT(ins,c) ins##s
+#endif
+
 #ifdef __ARM_EABI__
 .macro ARM_LDIV0 name signed
cmp r0, #0
-- 
2.34.1

[PATCH v7 15/34] Import 'popcnt' functions from the CM0 library

2022-10-31 Thread Daniel Engel

The functional overlap between the single- and double-word functions
makes this implementation about 30% smaller than the C functions
if both functions are linked together in the same appliation.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/popcnt.S (__popcountsi, __popcountdi2): New file.
* config/arm/lib1funcs.S: #include bit/popcnt.S
* config/arm/t-elf (LIB1ASMFUNCS): Add _popcountsi2/di2.
---
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/popcnt.S| 189 ++
 libgcc/config/arm/t-elf   |   2 +
 3 files changed, 192 insertions(+)
 create mode 100644 libgcc/config/arm/popcnt.S

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 3f7b9e739f0..0eb6d1d52a7 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1705,6 +1705,7 @@ LSYM(Lover12):
 #include "clz2.S"
 #include "ctz2.S"
 #include "parity.S"
+#include "popcnt.S"
 
 /*  */
 /* These next two sections are here despite the fact that they contain Thumb 
diff --git a/libgcc/config/arm/popcnt.S b/libgcc/config/arm/popcnt.S
new file mode 100644
index 000..4613ea475b0
--- /dev/null
+++ b/libgcc/config/arm/popcnt.S
@@ -0,0 +1,189 @@
+/* popcnt.S: ARM optimized popcount functions
+
+   Copyright (C) 2020-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_popcountdi2
+
+// int __popcountdi2(int)
+// Returns the number of bits set in $r1:$r0.
+// Returns the result in $r0.
+FUNC_START_SECTION popcountdi2 .text.sorted.libgcc.popcountdi2
+CFI_START_FUNCTION
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+// Initialize the result.
+// Compensate for the two extra loop (one for each word)
+//  required to detect zero arguments.
+movsr2, #2
+
+LLSYM(__popcountd_loop):
+// Same as __popcounts_loop below, except for $r1.
+subsr2, #1
+subsr3, r1, #1
+andsr1, r3
+bcs LLSYM(__popcountd_loop)
+
+// Repeat the operation for the second word.
+b   LLSYM(__popcounts_loop)
+
+  #else /* !__OPTIMIZE_SIZE__ */
+// Load the one-bit alternating mask.
+ldr r3, =0x
+
+// Reduce the second word.
+lsrsr2, r1, #1
+andsr2, r3
+subsr1, r2
+
+// Reduce the first word.
+lsrsr2, r0, #1
+andsr2, r3
+subsr0, r2
+
+// Load the two-bit alternating mask.
+ldr r3, =0x
+
+// Reduce the second word.
+lsrsr2, r1, #2
+andsr2, r3
+andsr1, r3
+addsr1, r2
+
+// Reduce the first word.
+lsrsr2, r0, #2
+andsr2, r3
+andsr0, r3
+addsr0, r2
+
+// There will be a maximum of 8 bits in each 4-bit field.
+// Jump into the single word flow to combine and complete.
+b   LLSYM(__popcounts_merge)
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+#endif /* L_popcountdi2 */
+
+
+// The implementation of __popcountdi2() tightly couples with __popcountsi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __popcountdi2() when only using __popcountsi2().
+// Therefore, this block configures __popcountsi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __popcountdi2().  The standalone version 
must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_popcountsi2' should appear b

[PATCH v7 08/34] Refactor 64-bit shift functions into a new file

2022-10-31 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/lib1funcs.S (__ashldi3, __ashrdi3, __lshldi3): Moved to ...
* config/arm/eabi/lshift.S: New file.
---
 libgcc/config/arm/eabi/lshift.S | 123 
 libgcc/config/arm/lib1funcs.S   | 103 +-
 2 files changed, 124 insertions(+), 102 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/lshift.S

diff --git a/libgcc/config/arm/eabi/lshift.S b/libgcc/config/arm/eabi/lshift.S
new file mode 100644
index 000..6e79d96c118
--- /dev/null
+++ b/libgcc/config/arm/eabi/lshift.S
@@ -0,0 +1,123 @@
+/* Copyright (C) 1995-2022 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_lshrdi3
+
+   FUNC_START lshrdi3
+   FUNC_ALIAS aeabi_llsr lshrdi3
+   
+#ifdef __thumb__
+   lsrsal, r2
+   movsr3, ah
+   lsrsah, r2
+   mov ip, r3
+   subsr2, #32
+   lsrsr3, r2
+   orrsal, r3
+   negsr2, r2
+   mov r3, ip
+   lslsr3, r2
+   orrsal, r3
+   RET
+#else
+   subsr3, r2, #32
+   rsb ip, r2, #32
+   movmi   al, al, lsr r2
+   movpl   al, ah, lsr r3
+   orrmi   al, al, ah, lsl ip
+   mov ah, ah, lsr r2
+   RET
+#endif
+   FUNC_END aeabi_llsr
+   FUNC_END lshrdi3
+
+#endif
+   
+#ifdef L_ashrdi3
+   
+   FUNC_START ashrdi3
+   FUNC_ALIAS aeabi_lasr ashrdi3
+   
+#ifdef __thumb__
+   lsrsal, r2
+   movsr3, ah
+   asrsah, r2
+   subsr2, #32
+   @ If r2 is negative at this point the following step would OR
+   @ the sign bit into all of AL.  That's not what we want...
+   bmi 1f
+   mov ip, r3
+   asrsr3, r2
+   orrsal, r3
+   mov r3, ip
+1:
+   negsr2, r2
+   lslsr3, r2
+   orrsal, r3
+   RET
+#else
+   subsr3, r2, #32
+   rsb ip, r2, #32
+   movmi   al, al, lsr r2
+   movpl   al, ah, asr r3
+   orrmi   al, al, ah, lsl ip
+   mov ah, ah, asr r2
+   RET
+#endif
+
+   FUNC_END aeabi_lasr
+   FUNC_END ashrdi3
+
+#endif
+
+#ifdef L_ashldi3
+
+   FUNC_START ashldi3
+   FUNC_ALIAS aeabi_llsl ashldi3
+   
+#ifdef __thumb__
+   lslsah, r2
+   movsr3, al
+   lslsal, r2
+   mov ip, r3
+   subsr2, #32
+   lslsr3, r2
+   orrsah, r3
+   negsr2, r2
+   mov r3, ip
+   lsrsr3, r2
+   orrsah, r3
+   RET
+#else
+   subsr3, r2, #32
+   rsb ip, r2, #32
+   movmi   ah, ah, lsl r2
+   movpl   ah, al, lsl r3
+   orrmi   ah, ah, al, lsr ip
+   mov al, al, lsl r2
+   RET
+#endif
+   FUNC_END aeabi_llsl
+   FUNC_END ashldi3
+
+#endif
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 6cf7561835d..aa5957b8399 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1699,108 +1699,7 @@ LSYM(Lover12):
 
 /* Prevent __aeabi double-word shifts from being produced on SymbianOS.  */
 #ifndef __symbian__
-
-#ifdef L_lshrdi3
-
-   FUNC_START lshrdi3
-   FUNC_ALIAS aeabi_llsr lshrdi3
-   
-#ifdef __thumb__
-   lsrsal, r2
-   movsr3, ah
-   lsrsah, r2
-   mov ip, r3
-   subsr2, #32
-   lsrsr3, r2
-   orrsal, r3
-   negsr2, r2
-   mov r3, ip
-   lslsr3, r2
-   orrsal, r3
-   RET
-#else
-   subsr3, r2, #32
-   rsb ip, r2, #32
-   movmi   al, al, lsr r2
-   movpl   al, ah, lsr r3
-   orrmi   al, al, ah, lsl ip
-   mov ah, ah, lsr r2
-   RET
-#endif
-   FUNC_END aeabi_llsr
-   FUNC_END lshrdi3
-
-#endif
-   
-#ifdef L_ashrdi3
-   
-   FUNC_START ashrdi3
-   FUNC_ALIAS aeabi_lasr ashrdi3
-   
-#ifdef __thumb__
-   lsrsal, r2
-   movsr3, ah
-   asrsah, r2
-   subs

[PATCH v7 03/34] Fix syntax warnings on conditional instructions

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/lib1funcs.S (RETLDM, ARM_DIV_BODY, ARM_MOD_BODY,
_interwork_call_via_lr): Moved condition code after the flags
update specifier "s".
(ARM_FUNC_START, THUMB_LDIV0): Removed redundant ".syntax".
---
 libgcc/config/arm/lib1funcs.S | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 726984a9d1d..f2f82f9d509 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -204,7 +204,7 @@ LSYM(Lend_fde):
 # if defined(__thumb2__)
pop\cond{\regs, lr}
 # else
-   ldm\cond\dirn   sp!, {\regs, lr}
+   ldm\dirn\cond   sp!, {\regs, lr}
 # endif
.endif
.ifnc "\unwind", ""
@@ -220,7 +220,7 @@ LSYM(Lend_fde):
 # if defined(__thumb2__)
pop\cond{\regs, pc}
 # else
-   ldm\cond\dirn   sp!, {\regs, pc}
+   ldm\dirn\cond   sp!, {\regs, pc}
 # endif
.endif
 #endif
@@ -292,7 +292,6 @@ LSYM(Lend_fde):
pop {r1, pc}
 
 #elif defined(__thumb2__)
-   .syntax unified
.ifc \signed, unsigned
cbz r0, 1f
mov r0, #0x
@@ -429,7 +428,6 @@ SYM (__\name):
 /* For Thumb-2 we build everything in thumb mode.  */
 .macro ARM_FUNC_START name
FUNC_START \name
-   .syntax unified
 .endm
 #define EQUIV .thumb_set
 .macro  ARM_CALL name
@@ -643,7 +641,7 @@ pc  .reqr15
orrhs   \result,   \result,   \curbit,  lsr #3
cmp \dividend, #0   @ Early termination?
do_it   ne, t
-   movnes  \curbit,   \curbit,  lsr #4 @ No, any more bits to do?
+   movsne  \curbit,   \curbit,  lsr #4 @ No, any more bits to do?
movne   \divisor,  \divisor, lsr #4
bne 1b
 
@@ -745,7 +743,7 @@ pc  .reqr15
subhs   \dividend, \dividend, \divisor, lsr #3
cmp \dividend, #1
mov \divisor, \divisor, lsr #4
-   subges  \order, \order, #4
+   subsge  \order, \order, #4
bge 1b
 
tst \order, #3
@@ -2093,7 +2091,7 @@ LSYM(Lchange_\register):
.globl .Lchange_lr
 .Lchange_lr:
tst lr, #1
-   stmeqdb r13!, {lr, pc}
+   stmdbeq r13!, {lr, pc}
mov ip, lr
adreq   lr, _arm_return
bx  ip
-- 
2.34.1

[PATCH v7 09/34] Import 'clz' functions from the CM0 library

2022-10-31 Thread Daniel Engel

On architectures without __ARM_FEATURE_CLZ, this version combines __clzdi2()
with __clzsi2() into a single object with an efficient tail call.  Also, this
version merges the formerly separate Thumb and ARM code implementations
into a unified instruction sequence.  This change significantly improves
Thumb performance without affecting ARM performance.  Finally, this version
adds a new __OPTIMIZE_SIZE__ build option (binary search loop).

There is no change to the code for architectures with __ARM_FEATURE_CLZ.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bits/clz2.S (__clzsi2, __clzdi2): Reduced code size on
architectures without __ARM_FEATURE_CLZ.
* config/arm/t-elf (LIB1ASMFUNCS): Moved _clzsi2 to new weak roup.
---
 libgcc/config/arm/clz2.S | 363 +--
 libgcc/config/arm/t-elf  |   7 +-
 2 files changed, 237 insertions(+), 133 deletions(-)

diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
index 439341752ba..ed04698fef4 100644
--- a/libgcc/config/arm/clz2.S
+++ b/libgcc/config/arm/clz2.S
@@ -1,145 +1,244 @@
-/* Copyright (C) 1995-2022 Free Software Foundation, Inc.
+/* clz2.S: Cortex M0 optimized 'clz' functions
 
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (g...@danielengel.com)
 
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
 
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
 
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+
+#ifdef L_clzdi2
+
+// int __clzdi2(long long)
+// Counts leading zero bits in $r1:$r0.
+// Returns the result in $r0.
+FUNC_START_SECTION clzdi2 .text.sorted.libgcc.clz2.clzdi2
+CFI_START_FUNCTION
+
+// Moved here from lib1funcs.S
+cmp xxh,#0
+do_it   eq, et
+clzeq   r0, xxl
+clzne   r0, xxh
+addeq   r0, #32
+RET
+
+CFI_END_FUNCTION
+FUNC_END clzdi2
+
+#endif /* L_clzdi2 */
 
 
 #ifdef L_clzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzsi2
-   movsr1, #28
-   movsr3, #1
-   lslsr3, r3, #16
-   cmp r0, r3 /* 0x1 */
-   bcc 2f
-   lsrsr0, r0, #16
-   subsr1, r1, #16
-2: lsrsr3, r3, #8
-   cmp r0, r3 /* #0x100 */
-   bcc 2f
-   lsrsr0, r0, #8
-   subsr1, r1, #8
-2: lsrsr3, r3, #4
-   cmp r0, r3 /* #0x10 */
-   bcc 2f
-   lsrsr0, r0, #4
-   subsr1, r1, #4
-2: adr r2, 1f
-   ldrbr0, [r2, r0]
-   addsr0, r0, r1
-   bx lr
-.align 2
-1:
-.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
-   FUNC_END clzsi2
-#else
-ARM_FUNC_START clzsi2
-# if defined (__ARM_FEATURE_CLZ)
-   clz r0, r0
-   RET
-# else
-   mov r1, #28
-   cmp r0, #0x1
-   do_it   cs, t
-   movcs   r0, r0, lsr #16
-   subcs   r1, r1, #16
-   cmp r0, #0x100
-   do_it   cs, t
-   movcs   r0, r0, lsr #8
-   subcs   r1, r1, #8
-   cmp r0, #0x10
-   do_it   cs, t
-   movcs   r0, r0, lsr #4
-   subcs   r1, r1, #4
-   adr r2, 1f
-   ldrbr0, [r2, r0]
-   add r0, r0, r1
-   RET
-.align

[PATCH v7 17/34] Import 64-bit comparison from CM0 library

2022-10-31 Thread Daniel Engel

These are 2-5 instructions smaller and just as fast.  Branches are
minimized, which will allow easier adaptation to Thumb-2/ARM mode.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/eabi/lcmp.S (__aeabi_lcmp, __aeabi_ulcmp): Replaced;
add macro configuration to build __cmpdi2() and __ucmpdi2().
* config/arm/t-elf (LIB1ASMFUNCS): Added _cmpdi2 and _ucmpdi2.
---
 libgcc/config/arm/eabi/lcmp.S | 151 +-
 libgcc/config/arm/t-elf   |   2 +
 2 files changed, 112 insertions(+), 41 deletions(-)

diff --git a/libgcc/config/arm/eabi/lcmp.S b/libgcc/config/arm/eabi/lcmp.S
index 336db1d398c..99c7970ecba 100644
--- a/libgcc/config/arm/eabi/lcmp.S
+++ b/libgcc/config/arm/eabi/lcmp.S
@@ -1,8 +1,7 @@
-/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
-   ARMv6-M and ARMv8-M Baseline like ISA variants.
+/* lcmp.S: Thumb-1 optimized 64-bit integer comparison
 
-   Copyright (C) 2006-2020 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
 
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
@@ -24,50 +23,120 @@
<http://www.gnu.org/licenses/>.  */
 
 
+#if defined(L_aeabi_lcmp) || defined(L_cmpdi2)
+
 #ifdef L_aeabi_lcmp
+  #define LCMP_NAME aeabi_lcmp
+  #define LCMP_SECTION .text.sorted.libgcc.lcmp
+#else
+  #define LCMP_NAME cmpdi2
+  #define LCMP_SECTION .text.sorted.libgcc.cmpdi2
+#endif
+
+// int __aeabi_lcmp(long long, long long)
+// int __cmpdi2(long long, long long)
+// Compares the 64 bit signed values in $r1:$r0 and $r3:$r2.
+// lcmp() returns $r0 = { -1, 0, +1 } for orderings { <, ==, > } respectively.
+// cmpdi2() returns $r0 = { 0, 1, 2 } for orderings { <, ==, > } respectively.
+// Object file duplication assumes typical programs follow one runtime ABI.
+FUNC_START_SECTION LCMP_NAME LCMP_SECTION
+CFI_START_FUNCTION
+
+// Calculate the difference $r1:$r0 - $r3:$r2.
+subsxxl,yyl
+sbcsxxh,yyh
+
+// With $r2 free, create a known offset value without affecting
+//  the N or Z flags.
+// BUG? The originally unified instruction for v6m was 'mov r2, r3'.
+//  However, this resulted in a compile error with -mthumb:
+//"MOV Rd, Rs with two low registers not permitted".
+// Since unified syntax deprecates the "cpy" instruction, shouldn't
+//  there be a backwards-compatible tranlation available?
+cpy r2, r3
+
+// Evaluate the comparison result.
+blt LLSYM(__lcmp_lt)
+
+// The reference offset ($r2 - $r3) will be +2 iff the first
+//  argument is larger, otherwise the offset value remains 0.
+addsr2, #2
+
+// Check for zero (equality in 64 bits).
+// It doesn't matter which register was originally "hi".
+orrsr0,r1
+
+// The result is already 0 on equality.
+beq LLSYM(__lcmp_return)
+
+LLSYM(__lcmp_lt):
+// Create +1 or -1 from the offset value defined earlier.
+addsr3, #1
+subsr0, r2, r3
+
+LLSYM(__lcmp_return):
+  #ifdef L_cmpdi2
+// Offset to the correct output specification.
+addsr0, #1
+  #endif
 
-FUNC_START aeabi_lcmp
-cmp xxh, yyh
-beq 1f
-bgt 2f
-movsr0, #1
-negsr0, r0
-RET
-2:
-movsr0, #1
-RET
-1:
-subsr0, xxl, yyl
-beq 1f
-bhi 2f
-movsr0, #1
-negsr0, r0
-RET
-2:
-movsr0, #1
-1:
 RET
-FUNC_END aeabi_lcmp
 
-#endif /* L_aeabi_lcmp */
+CFI_END_FUNCTION
+FUNC_END LCMP_NAME
+
+#endif /* L_aeabi_lcmp || L_cmpdi2 */
+
+
+#if defined(L_aeabi_ulcmp) || defined(L_ucmpdi2)
 
 #ifdef L_aeabi_ulcmp
+  #define ULCMP_NAME aeabi_ulcmp
+  #define ULCMP_SECTION .text.sorted.libgcc.ulcmp
+#else
+  #define ULCMP_NAME ucmpdi2
+  #define ULCMP_SECTION .text.sorted.libgcc.ucmpdi2
+#endif
+
+// int __aeabi_ulcmp(unsigned long long, unsigned long long)
+// int __ucmpdi2(unsigned long long, unsigned long long)
+// Compares the 64 bit unsigned values in $r1:$r0 and $r3:$r2.
+// ulcmp() returns $r0 = { -1, 0, +1 } for orderings { <, ==, > } respectively.
+// ucmpdi2() returns $r0 = { 0, 1, 2 } for orderings { <, ==, > } respectively.
+// Object file duplication assumes typical programs follow one runtime ABI.
+FUNC_START_SECTION ULCMP_NAME ULCMP_SECTION
+CFI_START_FUNCTION
+
+// Calculate the 'C' flag.
+subsxxl,yyl
+sbcsxxh,yyh
+
+// Capture the carry flg.
+// $r2 wil

[PATCH v7 10/34] Import 'ctz' functions from the CM0 library

2022-10-31 Thread Daniel Engel

This version combines __ctzdi2() with __ctzsi2() into a single object with
an efficient tail call.  The former implementation of __ctzdi2() was in C.

On architectures without __ARM_FEATURE_CLZ, this version merges the formerly
separate Thumb and ARM code sequences into a unified instruction sequence.
This change significantly improves Thumb performance without affecting ARM
performance.  Finally, this version adds a new __OPTIMIZE_SIZE__ build option.

On architectures with __ARM_FEATURE_CLZ, __ctzsi2(0) now returns 32.  Formerly,
__ctzsi2(0) would return -1.  Architectures without __ARM_FEATURE_CLZ have
always returned 32, so this change makes the return value consistent.
This change costs 2 extra instructions (branchless).

Likewise on architectures with __ARM_FEATURE_CLZ,  __ctzdi2(0) now returns
64 instead of 31.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bits/ctz2.S (__ctzdi2): Added a new function.
(__clzsi2): Reduced size on architectures without __ARM_FEATURE_CLZ;
changed so __clzsi2(0)=32 on architectures wtih __ARM_FEATURE_CLZ.
* config/arm/t-elf (LIB1ASMFUNCS): Added _ctzdi2;
moved _ctzsi2 to the weak function objects group.
---
 libgcc/config/arm/ctz2.S | 308 +--
 libgcc/config/arm/t-elf  |   3 +-
 2 files changed, 233 insertions(+), 78 deletions(-)

diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
index 1d885dcc71a..82c81c6ae11 100644
--- a/libgcc/config/arm/ctz2.S
+++ b/libgcc/config/arm/ctz2.S
@@ -1,86 +1,240 @@
-/* Copyright (C) 1995-2022 Free Software Foundation, Inc.
+/* ctz2.S: ARM optimized 'ctz' functions
 
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
+   Copyright (C) 2020-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (g...@danielengel.com)
 
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
 
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
 
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
 
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
 
-#ifdef L_ctzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START ctzsi2
-   negsr1, r0
-   andsr0, r0, r1
-   movsr1, #28
-   movsr3, #1
-   lslsr3, r3, #16
-   cmp r0, r3 /* 0x1 */
-   bcc 2f
-   lsrsr0, r0, #16
-   subsr1, r1, #16
-2: lsrsr3, r3, #8
-   cmp r0, r3 /* #0x100 */
-   bcc 2f
-   lsrsr0, r0, #8
-   subsr1, r1, #8
-2: lsrsr3, r3, #4
-   cmp r0, r3 /* #0x10 */
-   bcc 2f
-   lsrsr0, r0, #4
-   subsr1, r1, #4
-2: adr r2, 1f
-   ldrbr0, [r2, r0]
-   subsr0, r0, r1
-   bx lr
-.align 2
-1:
-.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-   FUNC_END ctzsi2
+
+// When the hardware 'ctz' function is available, an efficient version
+//  of __ctzsi2(x) can be created by calculating '31 - __ctzsi2(lsb(x))',
+//  where lsb(x) is 'x' with only the least-significant '1' bit set.
+// The following offset applies to all of the functions in this file.
+#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+  #define CTZ_RESULT_OFFSET 1
 #else
-ARM_FUNC_START ctzsi2
-   rsb r1, r0, #0
-   and r0, r0, r1
-# if defined (__ARM_FEATURE_CLZ)
-   clz r0, r0
-   rsb

[PATCH v7 11/34] Import 64-bit shift functions from the CM0 library

2022-10-31 Thread Daniel Engel

The Thumb versions of these functions are each 1-2 instructions smaller
and faster, and branchless when the IT instruction is available.

The ARM versions were converted to the "xxl/xxh" big-endian register
naming convention, but are otherwise unchanged.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bits/shift.S (__ashldi3, __ashrdi3, __lshldi3):
Reduced code size on Thumb architectures;
updated big-endian register naming convention to "xxl/xxh".
---
 libgcc/config/arm/eabi/lshift.S | 338 +---
 1 file changed, 228 insertions(+), 110 deletions(-)

diff --git a/libgcc/config/arm/eabi/lshift.S b/libgcc/config/arm/eabi/lshift.S
index 6e79d96c118..365350dfb2d 100644
--- a/libgcc/config/arm/eabi/lshift.S
+++ b/libgcc/config/arm/eabi/lshift.S
@@ -1,123 +1,241 @@
-/* Copyright (C) 1995-2022 Free Software Foundation, Inc.
+/* lshift.S: ARM optimized 64-bit integer shift
 
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
 
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
 
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
 
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
 
 
 #ifdef L_lshrdi3
 
-   FUNC_START lshrdi3
-   FUNC_ALIAS aeabi_llsr lshrdi3
-   
-#ifdef __thumb__
-   lsrsal, r2
-   movsr3, ah
-   lsrsah, r2
-   mov ip, r3
-   subsr2, #32
-   lsrsr3, r2
-   orrsal, r3
-   negsr2, r2
-   mov r3, ip
-   lslsr3, r2
-   orrsal, r3
-   RET
-#else
-   subsr3, r2, #32
-   rsb ip, r2, #32
-   movmi   al, al, lsr r2
-   movpl   al, ah, lsr r3
-   orrmi   al, al, ah, lsl ip
-   mov ah, ah, lsr r2
-   RET
-#endif
-   FUNC_END aeabi_llsr
-   FUNC_END lshrdi3
-
-#endif
-   
+// long long __aeabi_llsr(long long, int)
+// Logical shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+FUNC_START_SECTION aeabi_llsr .text.sorted.libgcc.lshrdi3
+FUNC_ALIAS lshrdi3 aeabi_llsr
+CFI_START_FUNCTION
+
+  #if defined(__thumb__) && __thumb__
+
+// Save a copy for the remainder.
+movsr3, xxh
+
+// Assume a simple shift.
+lsrsxxl,r2
+lsrsxxh,r2
+
+// Test if the shift distance is larger than 1 word.
+subsr2, #32
+
+#ifdef __HAVE_FEATURE_IT
+do_it   lo,te
+
+// The remainder is opposite the main shift, (32 - x) bits.
+rsblo   r2, #0
+lsllo   r3, r2
+
+// The remainder shift extends into the hi word.
+lsrhs   r3, r2
+
+#else /* !__HAVE_FEATURE_IT */
+bhs LLSYM(__llsr_large)
+
+// The remainder is opposite the main shift, (32 - x) bits.
+rsbsr2, #0
+lslsr3, r2
+
+// Cancel any remaining shift.
+eorsr2, r2
+
+  LLSYM(__llsr_large):
+// Apply any remaining shift to the hi word.
+lsrsr3, r2
+
+#endif /* !__HAVE_FEATURE_IT */
+
+// Merge remainder and

[PATCH v7 18/34] Merge Thumb-2 optimizations for 64-bit comparison

2022-10-31 Thread Daniel Engel

This effectively merges support for all architecture variants into a
common function path with appropriate build conditions.
ARM performance is 1-2 instructions faster; Thumb-2 is about 50% faster.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bpabi.S (__aeabi_lcmp, __aeabi_ulcmp): Removed.
* config/arm/eabi/lcmp.S (__aeabi_lcmp, __aeabi_ulcmp): Added
conditional execution on supported architectures (__ARM_FEATURE_IT).
* config/arm/lib1funcs.S: Moved #include scope of eabi/lcmp.S.
---
 libgcc/config/arm/bpabi.S | 42 ---
 libgcc/config/arm/eabi/lcmp.S | 47 ++-
 libgcc/config/arm/lib1funcs.S |  2 +-
 3 files changed, 47 insertions(+), 44 deletions(-)

diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S
index 17fe707ddf3..531a64fa98d 100644
--- a/libgcc/config/arm/bpabi.S
+++ b/libgcc/config/arm/bpabi.S
@@ -34,48 +34,6 @@
.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-#ifdef L_aeabi_lcmp
-
-ARM_FUNC_START aeabi_lcmp
-   cmp xxh, yyh
-   do_it   lt
-   movlt   r0, #-1
-   do_it   gt
-   movgt   r0, #1
-   do_it   ne
-   RETc(ne)
-   subsr0, xxl, yyl
-   do_it   lo
-   movlo   r0, #-1
-   do_it   hi
-   movhi   r0, #1
-   RET
-   FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-   
-#ifdef L_aeabi_ulcmp
-
-ARM_FUNC_START aeabi_ulcmp
-   cmp xxh, yyh
-   do_it   lo
-   movlo   r0, #-1
-   do_it   hi
-   movhi   r0, #1
-   do_it   ne
-   RETc(ne)
-   cmp xxl, yyl
-   do_it   lo
-   movlo   r0, #-1
-   do_it   hi
-   movhi   r0, #1
-   do_it   eq
-   moveq   r0, #0
-   RET
-   FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
-
 .macro test_div_by_zero signed
 /* Tail-call to divide-by-zero handlers which may be overridden by the user,
so unwinding works properly.  */
diff --git a/libgcc/config/arm/eabi/lcmp.S b/libgcc/config/arm/eabi/lcmp.S
index 99c7970ecba..d397325cbef 100644
--- a/libgcc/config/arm/eabi/lcmp.S
+++ b/libgcc/config/arm/eabi/lcmp.S
@@ -46,6 +46,19 @@ FUNC_START_SECTION LCMP_NAME LCMP_SECTION
 subsxxl,yyl
 sbcsxxh,yyh
 
+#ifdef __HAVE_FEATURE_IT
+do_it   lt,t
+
+  #ifdef L_aeabi_lcmp
+movlt   r0,#-1
+  #else
+movlt   r0,#0
+  #endif
+
+// Early return on '<'.
+RETc(lt)
+
+#else /* !__HAVE_FEATURE_IT */
 // With $r2 free, create a known offset value without affecting
 //  the N or Z flags.
 // BUG? The originally unified instruction for v6m was 'mov r2, r3'.
@@ -62,17 +75,27 @@ FUNC_START_SECTION LCMP_NAME LCMP_SECTION
 //  argument is larger, otherwise the offset value remains 0.
 addsr2, #2
 
+#endif
+
 // Check for zero (equality in 64 bits).
 // It doesn't matter which register was originally "hi".
 orrsr0,r1
 
+#ifdef __HAVE_FEATURE_IT
+// The result is already 0 on equality.
+// -1 already returned, so just force +1.
+do_it   ne
+movne   r0, #1
+
+#else /* !__HAVE_FEATURE_IT */
 // The result is already 0 on equality.
 beq LLSYM(__lcmp_return)
 
-LLSYM(__lcmp_lt):
+  LLSYM(__lcmp_lt):
 // Create +1 or -1 from the offset value defined earlier.
 addsr3, #1
 subsr0, r2, r3
+#endif
 
 LLSYM(__lcmp_return):
   #ifdef L_cmpdi2
@@ -111,21 +134,43 @@ FUNC_START_SECTION ULCMP_NAME ULCMP_SECTION
 subsxxl,yyl
 sbcsxxh,yyh
 
+#ifdef __HAVE_FEATURE_IT
+do_it   lo,t
+
+  #ifdef L_aeabi_ulcmp
+movlo   r0, -1
+  #else
+movlo   r0, #0
+  #endif
+
+// Early return on '<'.
+RETc(lo)
+
+#else
 // Capture the carry flg.
 // $r2 will contain -1 if the first value is smaller,
 //  0 if the first value is larger or equal.
 sbcsr2, r2
+#endif
 
 // Check for zero (equality in 64 bits).
 // It doesn't matter which register was originally "hi".
 orrsr0, r1
 
+#ifdef __HAVE_FEATURE_IT
+// The result is already 0 on equality.
+// -1 already returned, so just force +1.
+do_it   ne
+movne   r0, #1
+
+#else /* !__HAVE_FEATURE_IT */
 // The result is already 0 on equality.
 beq LLSYM(__ulcmp_return)
 
 // Assume +1.  If -1 is correct, $r2 will override.
 movsr0, #1
 orrsr0, r2
+#endif
 
 LLSYM(__ulcmp_return):
   #ifdef L_ucmpdi2
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index d85a20252d9..796f6f30ed9 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b

[PATCH v7 13/34] Import 'ffs' functions from the CM0 library

2022-10-31 Thread Daniel Engel

This implementation provides an efficient tail call to __clzdi2(), making the
functions rather smaller and faster than the C versions.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bits/ctz2.S (__ffssi2, __ffsdi2): New functions.
* config/arm/t-elf (LIB1ASMFUNCS): Added _ffssi2 and _ffsdi2.
---
 libgcc/config/arm/ctz2.S | 77 +++-
 libgcc/config/arm/t-elf  |  2 ++
 2 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
index 82c81c6ae11..d57acabae01 100644
--- a/libgcc/config/arm/ctz2.S
+++ b/libgcc/config/arm/ctz2.S
@@ -1,4 +1,4 @@
-/* ctz2.S: ARM optimized 'ctz' functions
+/* ctz2.S: ARM optimized 'ctz' and related functions
 
Copyright (C) 2020-2022 Free Software Foundation, Inc.
Contributed by Daniel Engel (g...@danielengel.com)
@@ -238,3 +238,78 @@ FUNC_END ctzdi2
 
 #endif /* L_ctzsi2 || L_ctzdi2 */
 
+
+#ifdef L_ffsdi2
+
+// int __ffsdi2(int)
+// Return the index of the least significant 1-bit in $r1:r0,
+//  or zero if $r1:r0 is zero.  The least significant bit is index 1.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+// Same section as __ctzsi2() for sake of the tail call branches.
+FUNC_START_SECTION ffsdi2 .text.sorted.libgcc.ctz2.ffsdi2
+CFI_START_FUNCTION
+
+// Simplify branching by assuming a non-zero lower word.
+// For all such, ffssi2(x) == ctzsi2(x) + 1.
+movsr2,#(33 - CTZ_RESULT_OFFSET)
+
+  #if defined(__ARMEB__) && __ARMEB__
+// HACK: Save the upper word in a scratch register.
+movsr3, r0
+
+// Test the lower word.
+movsr0, r1
+bne SYM(__internal_ctzsi2)
+
+// Test the upper word.
+movsr2,#(65 - CTZ_RESULT_OFFSET)
+movsr0, r3
+bne SYM(__internal_ctzsi2)
+
+  #else /* !__ARMEB__ */
+// Test the lower word.
+cmp r0, #0
+bne SYM(__internal_ctzsi2)
+
+// Test the upper word.
+movsr2,#(65 - CTZ_RESULT_OFFSET)
+movsr0, r1
+bne SYM(__internal_ctzsi2)
+
+  #endif /* !__ARMEB__ */
+
+// Upper and lower words are both zero.
+RET
+
+CFI_END_FUNCTION
+FUNC_END ffsdi2
+
+#endif /* L_ffsdi2 */
+
+
+#ifdef L_ffssi2
+
+// int __ffssi2(int)
+// Return the index of the least significant 1-bit in $r0,
+//  or zero if $r0 is zero.  The least significant bit is index 1.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+// Same section as __ctzsi2() for sake of the tail call branches.
+FUNC_START_SECTION ffssi2 .text.sorted.libgcc.ctz2.ffssi2
+CFI_START_FUNCTION
+
+// Simplify branching by assuming a non-zero argument.
+// For all such, ffssi2(x) == ctzsi2(x) + 1.
+movsr2,#(33 - CTZ_RESULT_OFFSET)
+
+// Test for zero, return unmodified.
+cmp r0, #0
+bne SYM(__internal_ctzsi2)
+RET
+
+CFI_END_FUNCTION
+FUNC_END ffssi2
+
+#endif /* L_ffssi2 */
+
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 89071cebe45..346fc766f17 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -35,6 +35,8 @@ LIB1ASMFUNCS += \
_clrsbdi2 \
_clzdi2 \
_ctzdi2 \
+   _ffssi2 \
+   _ffsdi2 \
_dvmd_tls \
_divsi3 \
_modsi3 \
-- 
2.34.1

[PATCH v7 12/34] Import 'clrsb' functions from the CM0 library

2022-10-31 Thread Daniel Engel

This implementation provides an efficient tail call to __clzsi2(), making the
functions rather smaller and faster than the C versions.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bits/clz2.S (__clrsbsi2, __clrsbdi2):
Added new functions.
* config/arm/t-elf (LIB1ASMFUNCS):
Added new function objects _clrsbsi2 and _clrsbdi2).
---
 libgcc/config/arm/clz2.S | 108 ++-
 libgcc/config/arm/t-elf  |   2 +
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
index ed04698fef4..3d40811278b 100644
--- a/libgcc/config/arm/clz2.S
+++ b/libgcc/config/arm/clz2.S
@@ -1,4 +1,4 @@
-/* clz2.S: Cortex M0 optimized 'clz' functions
+/* clz2.S: ARM optimized 'clz' and related functions
 
Copyright (C) 2018-2022 Free Software Foundation, Inc.
Contributed by Daniel Engel (g...@danielengel.com)
@@ -23,7 +23,7 @@
<http://www.gnu.org/licenses/>.  */
 
 
-#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+#ifdef __ARM_FEATURE_CLZ
 
 #ifdef L_clzdi2
 
@@ -242,3 +242,107 @@ FUNC_END clzdi2
 
 #endif /* !__ARM_FEATURE_CLZ */
 
+
+#ifdef L_clrsbdi2
+
+// int __clrsbdi2(int)
+// Counts the number of "redundant sign bits" in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and $r3 as scratch space.
+FUNC_START_SECTION clrsbdi2 .text.sorted.libgcc.clz2.clrsbdi2
+CFI_START_FUNCTION
+
+  #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+// Invert negative signs to keep counting zeros.
+asrsr3, xxh,#31
+eorsxxl,r3
+eorsxxh,r3
+
+// Same as __clzdi2(), except that the 'C' flag is pre-calculated.
+// Also, the trailing 'subs', since the last bit is not redundant.
+do_it   eq, et
+clzeq   r0, xxl
+clzne   r0, xxh
+addeq   r0, #32
+subsr0, #1
+RET
+
+  #else  /* !__ARM_FEATURE_CLZ */
+// Result if all the bits in the argument are zero.
+// Set it here to keep the flags clean after 'eors' below.
+movsr2, #31
+
+// Invert negative signs to keep counting zeros.
+asrsr3, xxh,#31
+eorsxxh,r3
+
+#if defined(__ARMEB__) && __ARMEB__
+// If the upper word is non-zero, return '__clzsi2(upper) - 1'.
+bne SYM(__internal_clzsi2)
+
+// The upper word is zero, prepare the lower word.
+movsr0, r1
+eorsr0, r3
+
+#else /* !__ARMEB__ */
+// Save the lower word temporarily.
+// This somewhat awkward construction adds one cycle when the
+//  branch is not taken, but prevents a double-branch.
+eorsr3, r0
+
+// If the upper word is non-zero, return '__clzsi2(upper) - 1'.
+movsr0, r1
+bneSYM(__internal_clzsi2)
+
+// Restore the lower word.
+movsr0, r3
+
+#endif /* !__ARMEB__ */
+
+// The upper word is zero, return '31 + __clzsi2(lower)'.
+addsr2, #32
+b   SYM(__internal_clzsi2)
+
+  #endif /* !__ARM_FEATURE_CLZ */
+
+CFI_END_FUNCTION
+FUNC_END clrsbdi2
+
+#endif /* L_clrsbdi2 */
+
+
+#ifdef L_clrsbsi2
+
+// int __clrsbsi2(int)
+// Counts the number of "redundant sign bits" in $r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+FUNC_START_SECTION clrsbsi2 .text.sorted.libgcc.clz2.clrsbsi2
+CFI_START_FUNCTION
+
+// Invert negative signs to keep counting zeros.
+asrsr2, r0,#31
+eorsr0, r2
+
+  #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+// Count.
+clz r0, r0
+
+// The result for a positive value will always be >= 1.
+// By definition, the last bit is not redundant.
+subsr0, #1
+RET
+
+  #else /* !__ARM_FEATURE_CLZ */
+// Result if all the bits in the argument are zero.
+// By definition, the last bit is not redundant.
+movsr2, #31
+b   SYM(__internal_clzsi2)
+
+  #endif  /* !__ARM_FEATURE_CLZ */
+
+CFI_END_FUNCTION
+FUNC_END clrsbsi2
+
+#endif /* L_clrsbsi2 */
+
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 33b83ac4adf..89071cebe45 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -31,6 +31,8 @@ LIB1ASMFUNCS += \
_ashldi3 \
_ashrdi3 \
_lshrdi3 \
+   _clrsbsi2 \
+   _clrsbdi2 \
_clzdi2 \
_ctzdi2 \
_dvmd_tls \
-- 
2.34.1

[PATCH v7 16/34] Refactor Thumb-1 64-bit comparison into a new file

2022-10-31 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bpabi-v6m.S (__aeabi_lcmp, __aeabi_ulcmp): Moved to ...
* config/arm/eabi/lcmp.S: New file.
* config/arm/lib1funcs.S: #include eabi/lcmp.S.
---
 libgcc/config/arm/bpabi-v6m.S | 46 --
 libgcc/config/arm/eabi/lcmp.S | 73 +++
 libgcc/config/arm/lib1funcs.S |  1 +
 3 files changed, 74 insertions(+), 46 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/lcmp.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index ea01d3f4d5f..3757e99508e 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -33,52 +33,6 @@
.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-#ifdef L_aeabi_lcmp
-
-FUNC_START aeabi_lcmp
-   cmp xxh, yyh
-   beq 1f
-   bgt 2f
-   movsr0, #1
-   negsr0, r0
-   RET
-2:
-   movsr0, #1
-   RET
-1:
-   subsr0, xxl, yyl
-   beq 1f
-   bhi 2f
-   movsr0, #1
-   negsr0, r0
-   RET
-2:
-   movsr0, #1
-1:
-   RET
-   FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-   
-#ifdef L_aeabi_ulcmp
-
-FUNC_START aeabi_ulcmp
-   cmp xxh, yyh
-   bne 1f
-   subsr0, xxl, yyl
-   beq 2f
-1:
-   bcs 1f
-   movsr0, #1
-   negsr0, r0
-   RET
-1:
-   movsr0, #1
-2:
-   RET
-   FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
 
 .macro test_div_by_zero signed
cmp yyh, #0
diff --git a/libgcc/config/arm/eabi/lcmp.S b/libgcc/config/arm/eabi/lcmp.S
new file mode 100644
index 000..336db1d398c
--- /dev/null
+++ b/libgcc/config/arm/eabi/lcmp.S
@@ -0,0 +1,73 @@
+/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
+   ARMv6-M and ARMv8-M Baseline like ISA variants.
+
+   Copyright (C) 2006-2020 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_aeabi_lcmp
+
+FUNC_START aeabi_lcmp
+cmp xxh, yyh
+beq 1f
+bgt 2f
+movsr0, #1
+negsr0, r0
+RET
+2:
+movsr0, #1
+RET
+1:
+subsr0, xxl, yyl
+beq 1f
+bhi 2f
+movsr0, #1
+negsr0, r0
+RET
+2:
+movsr0, #1
+1:
+RET
+FUNC_END aeabi_lcmp
+
+#endif /* L_aeabi_lcmp */
+
+#ifdef L_aeabi_ulcmp
+
+FUNC_START aeabi_ulcmp
+cmp xxh, yyh
+bne 1f
+subsr0, xxl, yyl
+beq 2f
+1:
+bcs 1f
+movsr0, #1
+negsr0, r0
+RET
+1:
+movsr0, #1
+2:
+RET
+FUNC_END aeabi_ulcmp
+
+#endif /* L_aeabi_ulcmp */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 0eb6d1d52a7..d85a20252d9 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1991,5 +1991,6 @@ LSYM(Lchange_\register):
 #include "bpabi.S"
 #else /* NOT_ISA_TARGET_32BIT */
 #include "bpabi-v6m.S"
+#include "eabi/lcmp.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #endif /* !__symbian__ */
-- 
2.34.1

[PATCH v7 21/34] Import 64-bit division from the CM0 library

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bpabi.c: Deleted unused file.
* config/arm/eabi/ldiv.S (__aeabi_ldivmod, __aeabi_uldivmod):
Replaced wrapper functions with a complete implementation.
* config/arm/t-bpabi (LIB2ADD_ST): Removed bpabi.c.
* config/arm/t-elf (LIB1ASMFUNCS): Added _divdi3 and _udivdi3.
---
 libgcc/config/arm/bpabi.c |  42 ---
 libgcc/config/arm/eabi/ldiv.S | 542 +-
 libgcc/config/arm/t-bpabi |   3 +-
 libgcc/config/arm/t-elf   |   9 +
 4 files changed, 474 insertions(+), 122 deletions(-)
 delete mode 100644 libgcc/config/arm/bpabi.c

diff --git a/libgcc/config/arm/bpabi.c b/libgcc/config/arm/bpabi.c
deleted file mode 100644
index d8ba940d1ff..000
--- a/libgcc/config/arm/bpabi.c
+++ /dev/null
@@ -1,42 +0,0 @@
-/* Miscellaneous BPABI functions.
-
-   Copyright (C) 2003-2022 Free Software Foundation, Inc.
-   Contributed by CodeSourcery, LLC.
-
-   This file is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by the
-   Free Software Foundation; either version 3, or (at your option) any
-   later version.
-
-   This file is distributed in the hope that it will be useful, but
-   WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   General Public License for more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   <http://www.gnu.org/licenses/>.  */
-
-extern long long __divdi3 (long long, long long);
-extern unsigned long long __udivdi3 (unsigned long long, 
-unsigned long long);
-extern long long __gnu_ldivmod_helper (long long, long long, long long *);
-
-
-long long
-__gnu_ldivmod_helper (long long a, 
- long long b, 
- long long *remainder)
-{
-  long long quotient;
-
-  quotient = __divdi3 (a, b);
-  *remainder = a - b * quotient;
-  return quotient;
-}
-
diff --git a/libgcc/config/arm/eabi/ldiv.S b/libgcc/config/arm/eabi/ldiv.S
index 3c8280ef580..e3ba6497761 100644
--- a/libgcc/config/arm/eabi/ldiv.S
+++ b/libgcc/config/arm/eabi/ldiv.S
@@ -1,8 +1,7 @@
-/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
-   ARMv6-M and ARMv8-M Baseline like ISA variants.
+/* ldiv.S: Thumb-1 optimized 64-bit integer division
 
-   Copyright (C) 2006-2020 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
 
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
@@ -24,84 +23,471 @@
<http://www.gnu.org/licenses/>.  */
 
 
-.macro test_div_by_zero signed
-cmp yyh, #0
-bne 7f
-cmp yyl, #0
-bne 7f
-cmp xxh, #0
-.ifc\signed, unsigned
-bne 2f
-cmp xxl, #0
-2:
-beq 3f
-movsxxh, #0
-mvnsxxh, xxh@ 0x
-movsxxl, xxh
-3:
-.else
-blt 6f
-bgt 4f
-cmp xxl, #0
-beq 5f
-4:  movsxxl, #0
-mvnsxxl, xxl@ 0x
-lsrsxxh, xxl, #1@ 0x7fff
-b   5f
-6:  movsxxh, #0x80
-lslsxxh, xxh, #24   @ 0x8000
-movsxxl, #0
-5:
-.endif
-@ tailcalls are tricky on v6-m.
-push{r0, r1, r2}
-ldr r0, 1f
-adr r1, 1f
-addsr0, r1
-str r0, [sp, #8]
-@ We know we are not on armv4t, so pop pc is safe.
-pop {r0, r1, pc}
-.align  2
-1:
-.word   __aeabi_ldiv0 - 1b
-7:
-.endm
-
-#ifdef L_aeabi_ldivmod
-
-FUNC_START aeabi_ldivmod
-test_div_by_zero signed
-
-push{r0, r1}
-mov r0, sp
-push{r0, lr}
-ldr r0, [sp, #8]
-bl  SYM(__gnu_ldivmod_helper)
-ldr r3, [sp, #4]
-mov lr, r3
-add sp, sp, #8
-pop {r2, r3}
+#ifndef __GNUC__
+
+// long long __aeabi_ldiv0(long long)
+// Helper function for division by 0.
+WEAK_START_SECTION aeabi_ldiv0 .text.sorted.libgcc.ldiv.ldiv0
+CFI_START_FUNCTION
+
+  #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+svc #(SVC_DIVISION_BY_ZERO)
+  #endif
+
 R

[PATCH v7 14/34] Import 'parity' functions from the CM0 library

2022-10-31 Thread Daniel Engel

The functional overlap between the single- and double-word functions makes
functions makes this implementation about half the size of the C functions
if both functions are linked in the same application.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/parity.S: New file for __paritysi2/di2().
* config/arm/lib1funcs.S: #include bit/parity.S
* config/arm/t-elf (LIB1ASMFUNCS): Added _paritysi2/di2.
---
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/parity.S| 120 ++
 libgcc/config/arm/t-elf   |   2 +
 3 files changed, 123 insertions(+)
 create mode 100644 libgcc/config/arm/parity.S

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index aa5957b8399..3f7b9e739f0 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1704,6 +1704,7 @@ LSYM(Lover12):
 
 #include "clz2.S"
 #include "ctz2.S"
+#include "parity.S"
 
 /*  */
 /* These next two sections are here despite the fact that they contain Thumb 
diff --git a/libgcc/config/arm/parity.S b/libgcc/config/arm/parity.S
new file mode 100644
index 000..1405bea93a3
--- /dev/null
+++ b/libgcc/config/arm/parity.S
@@ -0,0 +1,120 @@
+/* parity.S: ARM optimized parity functions
+
+   Copyright (C) 2020-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_paritydi2
+
+// int __paritydi2(int)
+// Returns '0' if the number of bits set in $r1:r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+FUNC_START_SECTION paritydi2 .text.sorted.libgcc.paritydi2
+CFI_START_FUNCTION
+
+// Combine the upper and lower words, then fall through.
+// Byte-endianness does not matter for this function.
+eorsr0, r1
+
+#endif /* L_paritydi2 */
+
+
+// The implementation of __paritydi2() tightly couples with __paritysi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __paritydi2() when only using __paritysi2().
+// Therefore, this block configures __paritysi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __paritydi2().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_paritysi2' should appear before '_paritydi2' in LIB1ASMFUNCS.
+#if defined(L_paritysi2) || defined(L_paritydi2)
+
+#ifdef L_paritysi2
+// int __paritysi2(int)
+// Returns '0' if the number of bits set in $r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+// Uses $r2 as scratch space.
+WEAK_START_SECTION paritysi2 .text.sorted.libgcc.paritysi2
+CFI_START_FUNCTION
+
+#else /* L_paritydi2 */
+FUNC_ENTRY paritysi2
+
+#endif
+
+  #if defined(__thumb__) && __thumb__
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+
+// Size optimized: 16 bytes, 40 cycles
+// Speed optimized: 24 bytes, 14 cycles
+movsr2, #16
+
+LLSYM(__parity_loop):
+// Calculate the parity of successively smaller half-words into the 
MSB.
+movsr1, r0
+lslsr1, r2
+eorsr0, r1
+lsrsr2, #1
+bne LLSYM(__parity_loop)
+
+#else /* !__OPTIMIZE_SIZE__ */
+
+// Unroll the loop.  The 'libgcc' reference C implementation replaces
+//  the x2 and the x1 shifts with a constant.  However, since it takes
+//  4 cycles to load, index, and mask the constant result, it doesn't
+//  cost anything to keep shifting (and saves a few bytes).
+lslsr1, r0, #16
+eorsr0, r1
+lslsr1, r0,

[PATCH v7 19/34] Import 32-bit division from the CM0 library

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/eabi/idiv.S: New file for __udivsi3() and __divsi3().
* config/arm/lib1funcs.S: #include eabi/idiv.S (v6m only).
---
 libgcc/config/arm/eabi/idiv.S | 299 ++
 libgcc/config/arm/lib1funcs.S |  19 ++-
 2 files changed, 317 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/arm/eabi/idiv.S

diff --git a/libgcc/config/arm/eabi/idiv.S b/libgcc/config/arm/eabi/idiv.S
new file mode 100644
index 000..6e54863611a
--- /dev/null
+++ b/libgcc/config/arm/eabi/idiv.S
@@ -0,0 +1,299 @@
+/* div.S: Thumb-1 size-optimized 32-bit integer division
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifndef __GNUC__
+
+// int __aeabi_idiv0(int)
+// Helper function for division by 0.
+WEAK_START_SECTION aeabi_idiv0 .text.sorted.libgcc.idiv.idiv0
+FUNC_ALIAS cm0_idiv0 aeabi_idiv0
+CFI_START_FUNCTION
+
+  #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+svc #(SVC_DIVISION_BY_ZERO)
+  #endif
+
+RET
+
+CFI_END_FUNCTION
+FUNC_END cm0_idiv0
+FUNC_END aeabi_idiv0
+
+#endif /* !__GNUC__ */
+
+
+#ifdef L_divsi3
+
+// int __aeabi_idiv(int, int)
+// idiv_return __aeabi_idivmod(int, int)
+// Returns signed $r0 after division by $r1.
+// Also returns the signed remainder in $r1.
+// Same parent section as __divsi3() to keep branches within range.
+FUNC_START_SECTION divsi3 .text.sorted.libgcc.idiv.divsi3
+
+#ifndef __symbian__
+  FUNC_ALIAS aeabi_idiv divsi3
+  FUNC_ALIAS aeabi_idivmod divsi3
+#endif
+
+CFI_START_FUNCTION
+
+// Extend signs.
+asrsr2, r0, #31
+asrsr3, r1, #31
+
+// Absolute value of the denominator, abort on division by zero.
+eorsr1, r3
+subsr1, r3
+  #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+beq LLSYM(__idivmod_zero)
+  #else
+beq SYM(__uidivmod_zero)
+  #endif
+
+// Absolute value of the numerator.
+eorsr0, r2
+subsr0, r2
+
+// Keep the sign of the numerator in bit[31] (for the remainder).
+// Save the XOR of the signs in bits[15:0] (for the quotient).
+push{ rT, lr }
+.cfi_remember_state
+.cfi_adjust_cfa_offset 8
+.cfi_rel_offset rT, 0
+.cfi_rel_offset lr, 4
+
+lsrsrT, r3, #16
+eorsrT, r2
+
+// Handle division as unsigned.
+bl  SYM(__uidivmod_nonzero) __PLT__
+
+// Set the sign of the remainder.
+asrsr2, rT, #31
+eorsr1, r2
+subsr1, r2
+
+// Set the sign of the quotient.
+sxthr3, rT
+eorsr0, r3
+subsr0, r3
+
+LLSYM(__idivmod_return):
+pop { rT, pc }
+.cfi_restore_state
+
+  #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+LLSYM(__idivmod_zero):
+// Set up the *div0() parameter specified in the ARM runtime ABI:
+//  * 0 if the numerator is 0,
+//  * Or, the largest value of the type manipulated by the calling
+// division function if the numerator is positive,
+//  * Or, the least value of the type manipulated by the calling
+// division function if the numerator is negative.
+subsr1, r0
+orrsr0, r1
+asrsr0, #31
+lsrsr0, #1
+eorsr0, r2
+
+// At least the __aeabi_idiv0() call is common.
+b   SYM(__uidivmod_zero2)
+  #endif /* PEDANTIC_DIV0 */
+
+CFI_END_FUNCTION
+FUNC_END divsi3
+
+#ifndef __symbian__
+  FUNC_END aeabi_idiv
+  FUNC_END aeabi_idivmod
+#endif 
+
+#endif /* L_divsi3 */
+
+
+#ifdef L_udivsi3
+
+// int __aeabi_uidiv(unsigned int, unsigned int)
+// idiv_return __aeabi_uidivmod(unsigned int, unsigned

[PATCH v7 22/34] Import integer multiplication from the CM0 library

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/eabi/lmul.S: New file for __muldi3(), __mulsidi3(), and
 __umulsidi3().
* config/arm/lib1funcs.S: #eabi/lmul.S (v6m only).
* config/arm/t-elf: Add the new objects to LIB1ASMFUNCS.
---
 libgcc/config/arm/eabi/lmul.S | 218 ++
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/t-elf   |  13 +-
 3 files changed, 230 insertions(+), 2 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/lmul.S

diff --git a/libgcc/config/arm/eabi/lmul.S b/libgcc/config/arm/eabi/lmul.S
new file mode 100644
index 000..377e571bf09
--- /dev/null
+++ b/libgcc/config/arm/eabi/lmul.S
@@ -0,0 +1,218 @@
+/* lmul.S: Thumb-1 optimized 64-bit integer multiplication
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_muldi3
+
+// long long __aeabi_lmul(long long, long long)
+// Returns the least significant 64 bits of a 64 bit multiplication.
+// Expects the two multiplicands in $r1:$r0 and $r3:$r2.
+// Returns the product in $r1:$r0 (does not distinguish signed types).
+// Uses $r4 and $r5 as scratch space.
+// Same parent section as __umulsidi3() to keep tail call branch within range.
+FUNC_START_SECTION muldi3 .text.sorted.libgcc.lmul.muldi3
+
+#ifndef __symbian__
+  FUNC_ALIAS aeabi_lmul muldi3
+#endif
+
+CFI_START_FUNCTION
+
+// $r1:$r0 = 0x
+// $r3:$r2 = 0x
+
+// The following operations that only affect the upper 64 bits
+//  can be safely discarded:
+//    * 
+//    * 
+//    * 
+//    * 
+//    * 
+//    * 
+
+// MAYBE: Test for multiply by ZERO on implementations with a 32-cycle
+//  'muls' instruction, and skip over the operation in that case.
+
+// (0x * 0x), free $r1
+mulsxxh,yyl
+
+// (0x * 0x), free $r3
+mulsyyh,xxl
+addsyyh,xxh
+
+// Put the parameters in the correct form for umulsidi3().
+movsxxh,yyl
+b   LLSYM(__mul_overflow)
+
+CFI_END_FUNCTION
+FUNC_END muldi3
+
+#ifndef __symbian__
+  FUNC_END aeabi_lmul
+#endif
+
+#endif /* L_muldi3 */
+
+
+// The following implementation of __umulsidi3() integrates with __muldi3()
+//  above to allow the fast tail call while still preserving the extra
+//  hi-shifted bits of the result.  However, these extra bits add a few
+//  instructions not otherwise required when using only __umulsidi3().
+// Therefore, this block configures __umulsidi3() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version adds the hi bits of __muldi3().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols in programs that multiply long doubles.
+// This means '_umulsidi3' should appear before '_muldi3' in LIB1ASMFUNCS.
+#if defined(L_muldi3) || defined(L_umulsidi3)
+
+#ifdef L_umulsidi3
+// unsigned long long __umulsidi3(unsigned int, unsigned int)
+// Returns all 64 bits of a 32 bit multiplication.
+// Expects the two multiplicands in $r0 and $r1.
+// Returns the product in $r1:$r0.
+// Uses $r3, $r4 and $ip as scratch space.
+WEAK_START_SECTION umulsidi3 .text.sorted.libgcc.lmul.umulsidi3
+CFI_START_FUNCTION
+
+#else /* L_muldi3 */
+FUNC_ENTRY umulsidi3
+CFI_START_FUNCTION
+
+// 32x32 multiply with 64 bit result.
+// Expand the multiply into 4 parts, since muls only returns 32 bits.
+// (a16h * b16h / 2^32)
+//   + (a16h * b16l / 2^48) + (a16l * b16h / 2^48)
+//   + (a16l * b16l / 2^64)
+
+// MAYBE: Test for multiply by 0 on implementations with a 32-cycle
+//  'muls' instruc

[PATCH v7 25/34] Refactor Thumb-1 float subtraction into a new file

2022-10-31 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bpabi-v6m.S (__aeabi_frsub): Moved to ...
* config/arm/eabi/fadd.S: New file.
* config/arm/lib1funcs.S: #include eabi/fadd.S (v6m only).
---
 libgcc/config/arm/bpabi-v6m.S | 16 ---
 libgcc/config/arm/eabi/fadd.S | 38 +++
 libgcc/config/arm/lib1funcs.S |  1 +
 3 files changed, 39 insertions(+), 16 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fadd.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index 8e0a45f4716..afba648ec57 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -33,22 +33,6 @@
.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-
-#ifdef L_arm_addsubsf3
-
-FUNC_START aeabi_frsub
-
-  push {r4, lr}
-  movs r4, #1
-  lsls r4, #31
-  eors r0, r0, r4
-  bl   __aeabi_fadd
-  pop  {r4, pc}
-
-  FUNC_END aeabi_frsub
-
-#endif /* L_arm_addsubsf3 */
-
 #ifdef L_arm_addsubdf3
 
 FUNC_START aeabi_drsub
diff --git a/libgcc/config/arm/eabi/fadd.S b/libgcc/config/arm/eabi/fadd.S
new file mode 100644
index 000..fffbd91d1bc
--- /dev/null
+++ b/libgcc/config/arm/eabi/fadd.S
@@ -0,0 +1,38 @@
+/* Copyright (C) 2006-2021 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_addsubsf3
+
+FUNC_START aeabi_frsub
+
+  push {r4, lr}
+  movs r4, #1
+  lsls r4, #31
+  eors r0, r0, r4
+  bl   __aeabi_fadd
+  pop  {r4, pc}
+
+  FUNC_END aeabi_frsub
+
+#endif /* L_arm_addsubsf3 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 188d9d7ff47..d1a2d2f7908 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2012,6 +2012,7 @@ LSYM(Lchange_\register):
 #include "bpabi-v6m.S"
 #include "eabi/fplib.h"
 #include "eabi/fcmp.S"
+#include "eabi/fadd.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #include "eabi/lcmp.S"
 #endif /* !__symbian__ */
-- 
2.34.1

[PATCH v7 24/34] Import float comparison from the CM0 library

2022-10-31 Thread Daniel Engel

These functions are significantly smaller and faster than the wrapper
functions and soft-float implementation they replace.  Using the first
comparison operator (e.g. '<=') in any program costs about 70 bytes
initially, but every additional operator incrementally adds just 4 bytes.

NOTE: It seems that the __aeabi_cfcmp*() routines formerly in bpabi-v6m.S
were not well tested, as they returned wrong results for the 'C' flag.
The replacement functions are fully tested.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/eabi/fcmp.S (__cmpsf2, __eqsf2, __gesf2,
__aeabi_fcmpne, __aeabi_fcmpun): Added new functions.
(__aeabi_fcmpeq, __aeabi_fcmpne, __aeabi_fcmplt, __aeabi_fcmple,
 __aeabi_fcmpge, __aeabi_fcmpgt, __aeabi_cfcmple, __aeabi_cfcmpeq,
 __aeabi_cfrcmple): Replaced with branches to __internal_cmpsf2().
* config/arm/eabi/fplib.h: New file with fcmp-specific constants
and general build configuration macros.
* config/arm/lib1funcs.S: #include eabi/fplib.h (v6m only).
* config/arm/t-elf (LIB1ASMFUNCS): Added _internal_cmpsf2,
_arm_cfcmpeq, _arm_cfcmple, _arm_cfrcmple, _arm_fcmpeq,
_arm_fcmpge, _arm_fcmpgt, _arm_fcmple, _arm_fcmplt, _arm_fcmpne,
_arm_eqsf2, and _arm_gesf2.
---
 libgcc/config/arm/eabi/fcmp.S  | 643 +
 libgcc/config/arm/eabi/fplib.h |  83 +
 libgcc/config/arm/lib1funcs.S  |   1 +
 libgcc/config/arm/t-elf|  18 +
 4 files changed, 681 insertions(+), 64 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fplib.h

diff --git a/libgcc/config/arm/eabi/fcmp.S b/libgcc/config/arm/eabi/fcmp.S
index 96d627f1fea..0c813fae8c5 100644
--- a/libgcc/config/arm/eabi/fcmp.S
+++ b/libgcc/config/arm/eabi/fcmp.S
@@ -1,8 +1,7 @@
-/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
-   ARMv6-M and ARMv8-M Baseline like ISA variants.
+/* fcmp.S: Thumb-1 optimized 32-bit float comparison
 
-   Copyright (C) 2006-2020 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
 
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
@@ -24,66 +23,582 @@
<http://www.gnu.org/licenses/>.  */
 
 
+// The various compare functions in this file all expect to tail call 
__cmpsf2()
+//  with flags set for a particular comparison mode.  The __internal_cmpsf2()
+//  symbol  itself is unambiguous, but there is a remote risk that the linker
+//  will prefer some other symbol in place of __cmpsf2().  Importing an archive
+//  file that also exports __cmpsf2() will throw an error in this case.
+// As a workaround, this block configures __aeabi_f2lz() for compilation twice.
+// The first version configures __internal_cmpsf2() as a WEAK standalone 
symbol,
+//  and the second exports __cmpsf2() and __internal_cmpsf2() normally.
+// A small bonus: programs not using __cmpsf2() itself will be slightly 
smaller.
+// 'L_internal_cmpsf2' should appear before 'L_arm_cmpsf2' in LIB1ASMFUNCS.
+#if defined(L_arm_cmpsf2) || defined(L_internal_cmpsf2)
+
+#define CMPSF2_SECTION .text.sorted.libgcc.fcmp.cmpsf2
+
+// int __cmpsf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+//  * +1 if ($r0 > $r1), or either argument is NAN
+//  *  0 if ($r0 == $r1)
+//  * -1 if ($r0 < $r1)
+// Uses $r2, $r3, and $ip as scratch space.
+#ifdef L_arm_cmpsf2
+FUNC_START_SECTION cmpsf2 CMPSF2_SECTION
+FUNC_ALIAS lesf2 cmpsf2
+FUNC_ALIAS ltsf2 cmpsf2
+CFI_START_FUNCTION
+
+// Assumption: The 'libgcc' functions should raise exceptions.
+movsr2, #(FCMP_UN_POSITIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+
+// int,int __internal_cmpsf2(float, float, int)
+// Internal function expects a set of control flags in $r2.
+// If ordered, returns a comparison type { 0, 1, 2 } in $r3
+FUNC_ENTRY internal_cmpsf2
+
+#else /* L_internal_cmpsf2 */
+WEAK_START_SECTION internal_cmpsf2 CMPSF2_SECTION
+CFI_START_FUNCTION
+
+#endif
+
+// When operand signs are considered, the comparison result falls
+//  within one of the following quadrants:
+//
+// $r0  $r1  $r0-$r1* flags  result
+//  ++  >  C=0 GT
+//  ++  =  Z=1 EQ
+//  ++  <  C=1 LT
+//  +-  >  C=1 GT
+//  +-  =  C=1 GT
+//  +-  <  C=1 GT
+//  -+  >  C=0 LT
+//  -+  =  C=0 LT
+//  -+  <  C=0 LT
+//  --  >  C=0

[PATCH v7 20/34] Refactor Thumb-1 64-bit division into a new file

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bpabi-v6m.S (__aeabi_ldivmod/ldivmod): Moved to ...
* config/arm/eabi/ldiv.S: New file.
* config/arm/lib1funcs.S: #include eabi/ldiv.S (v6m only).
---
 libgcc/config/arm/bpabi-v6m.S |  81 -
 libgcc/config/arm/eabi/ldiv.S | 107 ++
 libgcc/config/arm/lib1funcs.S |   1 +
 3 files changed, 108 insertions(+), 81 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/ldiv.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index 3757e99508e..d38a9208c60 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -34,87 +34,6 @@
 #endif /* __ARM_EABI__ */
 
 
-.macro test_div_by_zero signed
-   cmp yyh, #0
-   bne 7f
-   cmp yyl, #0
-   bne 7f
-   cmp xxh, #0
-   .ifc\signed, unsigned
-   bne 2f
-   cmp xxl, #0
-2:
-   beq 3f
-   movsxxh, #0
-   mvnsxxh, xxh@ 0x
-   movsxxl, xxh
-3:
-   .else
-   blt 6f
-   bgt 4f
-   cmp xxl, #0
-   beq 5f
-4: movsxxl, #0
-   mvnsxxl, xxl@ 0x
-   lsrsxxh, xxl, #1@ 0x7fff
-   b   5f
-6: movsxxh, #0x80
-   lslsxxh, xxh, #24   @ 0x8000
-   movsxxl, #0
-5:
-   .endif
-   @ tailcalls are tricky on v6-m.
-   push{r0, r1, r2}
-   ldr r0, 1f
-   adr r1, 1f
-   addsr0, r1
-   str r0, [sp, #8]
-   @ We know we are not on armv4t, so pop pc is safe.
-   pop {r0, r1, pc}
-   .align  2
-1:
-   .word   __aeabi_ldiv0 - 1b
-7:
-.endm
-
-#ifdef L_aeabi_ldivmod
-
-FUNC_START aeabi_ldivmod
-   test_div_by_zero signed
-
-   push{r0, r1}
-   mov r0, sp
-   push{r0, lr}
-   ldr r0, [sp, #8]
-   bl  SYM(__gnu_ldivmod_helper)
-   ldr r3, [sp, #4]
-   mov lr, r3
-   add sp, sp, #8
-   pop {r2, r3}
-   RET
-   FUNC_END aeabi_ldivmod
-
-#endif /* L_aeabi_ldivmod */
-
-#ifdef L_aeabi_uldivmod
-
-FUNC_START aeabi_uldivmod
-   test_div_by_zero unsigned
-
-   push{r0, r1}
-   mov r0, sp
-   push{r0, lr}
-   ldr r0, [sp, #8]
-   bl  SYM(__udivmoddi4)
-   ldr r3, [sp, #4]
-   mov lr, r3
-   add sp, sp, #8
-   pop {r2, r3}
-   RET
-   FUNC_END aeabi_uldivmod
-   
-#endif /* L_aeabi_uldivmod */
-
 #ifdef L_arm_addsubsf3
 
 FUNC_START aeabi_frsub
diff --git a/libgcc/config/arm/eabi/ldiv.S b/libgcc/config/arm/eabi/ldiv.S
new file mode 100644
index 000..3c8280ef580
--- /dev/null
+++ b/libgcc/config/arm/eabi/ldiv.S
@@ -0,0 +1,107 @@
+/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
+   ARMv6-M and ARMv8-M Baseline like ISA variants.
+
+   Copyright (C) 2006-2020 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+.macro test_div_by_zero signed
+cmp yyh, #0
+bne 7f
+cmp yyl, #0
+bne 7f
+cmp xxh, #0
+.ifc\signed, unsigned
+bne 2f
+cmp xxl, #0
+2:
+beq 3f
+movsxxh, #0
+mvnsxxh, xxh@ 0x
+movsxxl, xxh
+3:
+.else
+blt 6f
+bgt 4f
+cmp xxl, #0
+beq 5f
+4:  movsxxl, #0
+mvnsxxl, xxl@ 0x
+lsrsxxh, xxl, #1@ 0x7fff
+b   5f
+6:  movsxxh, #0x80
+lslsxxh, xxh, #24   @ 0x8000
+movsxxl, #0
+5:
+.endif
+@ tailcalls are tricky on v6-m.
+push{r0, r1, r2}
+ldr r0, 1f
+adr r1, 1f
+addsr0, r1
+str r0, [sp, #8]
+@ We know we are not on armv4t,

[PATCH v7 23/34] Refactor Thumb-1 float comparison into a new file

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bpabi-v6m.S (__aeabi_cfcmpeq, __aeabi_cfcmple,
__aeabi_cfrcmple, __aeabi_fcmpeq, __aeabi_fcmple, aeabi_fcmple,
__aeabi_fcmpgt, aeabi_fcmpge): Moved to ...
* config/arm/eabi/fcmp.S: New file.
* config/arm/lib1funcs.S: #include eabi/fcmp.S (v6m only).
---
 libgcc/config/arm/bpabi-v6m.S | 63 -
 libgcc/config/arm/eabi/fcmp.S | 89 +++
 libgcc/config/arm/lib1funcs.S |  1 +
 3 files changed, 90 insertions(+), 63 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fcmp.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index d38a9208c60..8e0a45f4716 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -49,69 +49,6 @@ FUNC_START aeabi_frsub
 
 #endif /* L_arm_addsubsf3 */
 
-#ifdef L_arm_cmpsf2
-
-FUNC_START aeabi_cfrcmple
-
-   mov ip, r0
-   movsr0, r1
-   mov r1, ip
-   b   6f
-
-FUNC_START aeabi_cfcmpeq
-FUNC_ALIAS aeabi_cfcmple aeabi_cfcmpeq
-
-   @ The status-returning routines are required to preserve all
-   @ registers except ip, lr, and cpsr.
-6: push{r0, r1, r2, r3, r4, lr}
-   bl  __lesf2
-   @ Set the Z flag correctly, and the C flag unconditionally.
-   cmp r0, #0
-   @ Clear the C flag if the return value was -1, indicating
-   @ that the first operand was smaller than the second.
-   bmi 1f
-   movsr1, #0
-   cmn r0, r1
-1:
-   pop {r0, r1, r2, r3, r4, pc}
-
-   FUNC_END aeabi_cfcmple
-   FUNC_END aeabi_cfcmpeq
-   FUNC_END aeabi_cfrcmple
-
-FUNC_START aeabi_fcmpeq
-
-   push{r4, lr}
-   bl  __eqsf2
-   negsr0, r0
-   addsr0, r0, #1
-   pop {r4, pc}
-
-   FUNC_END aeabi_fcmpeq
-
-.macro COMPARISON cond, helper, mode=sf2
-FUNC_START aeabi_fcmp\cond
-
-   push{r4, lr}
-   bl  __\helper\mode
-   cmp r0, #0
-   b\cond  1f
-   movsr0, #0
-   pop {r4, pc}
-1:
-   movsr0, #1
-   pop {r4, pc}
-
-   FUNC_END aeabi_fcmp\cond
-.endm
-
-COMPARISON lt, le
-COMPARISON le, le
-COMPARISON gt, ge
-COMPARISON ge, ge
-
-#endif /* L_arm_cmpsf2 */
-
 #ifdef L_arm_addsubdf3
 
 FUNC_START aeabi_drsub
diff --git a/libgcc/config/arm/eabi/fcmp.S b/libgcc/config/arm/eabi/fcmp.S
new file mode 100644
index 000..96d627f1fea
--- /dev/null
+++ b/libgcc/config/arm/eabi/fcmp.S
@@ -0,0 +1,89 @@
+/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
+   ARMv6-M and ARMv8-M Baseline like ISA variants.
+
+   Copyright (C) 2006-2020 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_cmpsf2
+
+FUNC_START aeabi_cfrcmple
+
+   mov ip, r0
+   movsr0, r1
+   mov r1, ip
+   b   6f
+
+FUNC_START aeabi_cfcmpeq
+FUNC_ALIAS aeabi_cfcmple aeabi_cfcmpeq
+
+   @ The status-returning routines are required to preserve all
+   @ registers except ip, lr, and cpsr.
+6: push{r0, r1, r2, r3, r4, lr}
+   bl  __lesf2
+   @ Set the Z flag correctly, and the C flag unconditionally.
+   cmp r0, #0
+   @ Clear the C flag if the return value was -1, indicating
+   @ that the first operand was smaller than the second.
+   bmi 1f
+   movsr1, #0
+   cmn r0, r1
+1:
+   pop {r0, r1, r2, r3, r4, pc}
+
+   FUNC_END aeabi_cfcmple
+   FUNC_END aeabi_cfcmpeq
+   FUNC_END aeabi_cfrcmple
+
+FUNC_START aeabi_fcmpeq
+
+   push{r4, lr}
+   bl  __eqsf2
+   negsr0, r0
+   addsr0, r0, #1
+   pop {r4, pc}
+
+   FUNC_END aeabi_fcmpeq
+
+.macro COMPARISON cond, helper, mode=sf2
+FUNC_START aeabi_fcmp\cond
+
+   push{r4, lr}
+   bl  __\helper\mode
+   cmp r0, #0
+   b\cond  1f
+   movsr0, #0
+   pop {r4, pc}
+1:
+   movs

[PATCH v7 26/34] Import float addition and subtraction from the CM0 library

2022-10-31 Thread Daniel Engel

Since this is the first import of single-precision functions, some common
parsing and formatting routines are also included.  These common rotines
will be referenced by other functions in subsequent commits.
However, even if the size penalty is accounted entirely to __addsf3(),
the total compiled size is still less than half the size of soft-float.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/eabi/fadd.S (__addsf3, __subsf3): Added new functions.
* config/arm/eabi/fneg.S (__negsf2): Added new file.
* config/arm/eabi/futil.S (__fp_normalize2, __fp_lalign2, __fp_assemble,
__fp_overflow, __fp_zero, __fp_check_nan): Added new file with shared
helper functions.
* config/arm/lib1funcs.S: #include eabi/fneg.S and eabi/futil.S (v6m 
only).
* config/arm/t-elf (LIB1ASMFUNCS): Added _arm_addsf3, _arm_frsubsf3,
_fp_exceptionf, _fp_checknanf, _fp_assemblef, and _fp_normalizef.
---
 libgcc/config/arm/eabi/fadd.S  | 306 +++-
 libgcc/config/arm/eabi/fneg.S  |  76 ++
 libgcc/config/arm/eabi/fplib.h |   3 -
 libgcc/config/arm/eabi/futil.S | 418 +
 libgcc/config/arm/lib1funcs.S  |   2 +
 libgcc/config/arm/t-elf|   6 +
 6 files changed, 798 insertions(+), 13 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fneg.S
 create mode 100644 libgcc/config/arm/eabi/futil.S

diff --git a/libgcc/config/arm/eabi/fadd.S b/libgcc/config/arm/eabi/fadd.S
index fffbd91d1bc..176e330a1b6 100644
--- a/libgcc/config/arm/eabi/fadd.S
+++ b/libgcc/config/arm/eabi/fadd.S
@@ -1,5 +1,7 @@
-/* Copyright (C) 2006-2021 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+/* fadd.S: Thumb-1 optimized 32-bit float addition and subtraction
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
 
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
@@ -21,18 +23,302 @@
<http://www.gnu.org/licenses/>.  */
 
 
+#ifdef L_arm_frsubsf3
+
+// float __aeabi_frsub(float, float)
+// Returns the floating point difference of $r1 - $r0 in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_frsub .text.sorted.libgcc.fpcore.b.frsub
+CFI_START_FUNCTION
+
+  #if defined(STRICT_NANS) && STRICT_NANS
+// Check if $r0 is NAN before modifying.
+lslsr2, r0, #1
+movsr3, #255
+lslsr3, #24
+
+// Let fadd() find the NAN in the normal course of operation,
+//  moving it to $r0 and checking the quiet/signaling bit.
+cmp r2, r3
+bhi SYM(__aeabi_fadd)
+  #endif
+
+// Flip sign and run through fadd().
+movsr2, #1
+lslsr2, #31
+addsr0, r2
+b   SYM(__aeabi_fadd)
+
+CFI_END_FUNCTION
+FUNC_END aeabi_frsub
+
+#endif /* L_arm_frsubsf3 */
+
+
 #ifdef L_arm_addsubsf3
 
-FUNC_START aeabi_frsub
+// float __aeabi_fsub(float, float)
+// Returns the floating point difference of $r0 - $r1 in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_fsub .text.sorted.libgcc.fpcore.c.faddsub
+FUNC_ALIAS subsf3 aeabi_fsub
+CFI_START_FUNCTION
 
-  push {r4, lr}
-  movs r4, #1
-  lsls r4, #31
-  eors r0, r0, r4
-  bl   __aeabi_fadd
-  pop  {r4, pc}
+  #if defined(STRICT_NANS) && STRICT_NANS
+// Check if $r1 is NAN before modifying.
+lslsr2, r1, #1
+movsr3, #255
+lslsr3, #24
 
-  FUNC_END aeabi_frsub
+// Let fadd() find the NAN in the normal course of operation,
+//  moving it to $r0 and checking the quiet/signaling bit.
+cmp r2, r3
+bhi SYM(__aeabi_fadd)
+  #endif
+
+// Flip sign and fall into fadd().
+movsr2, #1
+lslsr2, #31
+addsr1, r2
 
 #endif /* L_arm_addsubsf3 */
 
+
+// The execution of __subsf3() flows directly into __addsf3(), such that
+//  instructions must appear consecutively in the same memory section.
+//  However, this construction inhibits the ability to discard __subsf3()
+//  when only using __addsf3().
+// Therefore, this block configures __addsf3() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __subsf3().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_arm_addsf3' should appear before '_arm_addsubsf3' in LIB1ASMFUNCS.
+#if defined(L_arm_addsf3) || defined(L_arm_addsubsf3)
+
+#ifdef L_arm_addsf3
+// float __aeabi_fadd(fl

[PATCH v7 33/34] Drop single-precision Thumb-1 soft-float functions

2022-10-31 Thread Daniel Engel

 function
symbols first: _subQQ.o, _cmpQQ.o, etc.  The fixed-point archive elements
appear after the _arm_* archive elements, so the initial definitions of the
floating point functions are discarded.  However, the fixed-point functions
contain unresolved symbol references which the linker registers progressively.

Given that the default libgcc.a does not build the soft-point library [1],
the linker cannot import any floating point objects until the second pass.

However, when v6-m/nofp/libgcc.a _does_ include the soft-point library, the
linker proceeds to import some floating point objects during the first pass.

To test this theory, add explicit symbol references to convert-sat.c:

--- a/gcc/testsuite/gcc.dg/fixed-point/convert-sat.c
+++ b/gcc/testsuite/gcc.dg/fixed-point/convert-sat.c
@@ -11,6 +11,12 @@ extern void abort (void);

 int main ()
 {
+  volatile float a = 1.0;
+  volatile float b = 2.0;
+  volatile float c = a * b;
+  volatile double d = a;
+  volatile int e = a;
+
   SAT_CONV1 (short _Accum, hk);
   SAT_CONV1 (_Accum, k);
   SAT_CONV1 (long _Accum, lk);

Afterwards, the linker imports the expected symbols:
...
==> (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_mulsf3.o
==> (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_muldi3.o
==> (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fixsfsi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_f2d.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fp_exceptionf.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fp_assemblef.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fp_normalizef.o
...
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)muldf3.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)fixdfsi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_clzsi2.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fixunssfsi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fcmpge.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fcmple.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fixsfdi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fixunssfdi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_cmpdf2.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixunsdfsi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixdfdi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixunsdfdi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)eqdf2.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)gedf2.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)ledf2.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)subdf3.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)floatunsidf.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_cmpsf2.o
...

At a minimum this behavior results in the use of non-preferred code in an
affected application.  However, as long as each object exports a single
entry point, this does not automatically result in a build failure.

Indeed, in the case of __aeabi_fmul() and __aeabi_f2d(), all references seem
to resolve uniformly in favor of the soft-float library.  The first pass that
imports the soft-float version of __aeabi_f2iz() also succeeds.

However, the first pass fails to find __aeabi_f2uiz(), since the soft-float
library does not implement this variant.  So, this symbol remains undefined
until the second pass.  However, the assembly version of __aeabi_f2uiz()
the linker finds happens to be implemented as a branch to __internal_f2iz() [2].
But the linker, importing __internal_f2iz(), also finds the main entry point
__aeabi_f2iz().  And, since __aeabi_f2iz() was already found in the soft-float
library, the linker throws an error.

The solution is two-fold.  First, the assembly routines have separately been
made robust against this potential error condition (by weakening and splitting
symbols).  Second, this commit to block single-precision functions from the
soft-float library makes it impossible for the linker to select a non-preferred
version.  Two duplicate symbols remain (extendsfdf2) and (truncdfsf2), but the
situation is much improved.

[1] softfp_wrap_start = "#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1"

[2] (These operations share a substantial portion of their code path, so this
choice leads to a size reduction in programs that use both functions.)

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/t-softfp (softfp_float_modes): Added as "df".
---
 libgcc/config/arm/t-softfp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libgcc/config/arm/t-softfp b/libgcc/config/arm/t-softfp
index 554ec9bc47b..bd6a4642e5f

[PATCH v7 27/34] Import float multiplication from the CM0 library

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/eabi/fmul.S (__mulsf3): New file.
* config/arm/lib1funcs.S: #include eabi/fmul.S (v6m only).
* config/arm/t-elf (LIB1ASMFUNCS): Moved _mulsf3 to global scope
(this object was previously blocked on v6m builds).
---
 libgcc/config/arm/eabi/fmul.S | 215 ++
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/t-elf   |   3 +-
 3 files changed, 218 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/arm/eabi/fmul.S

diff --git a/libgcc/config/arm/eabi/fmul.S b/libgcc/config/arm/eabi/fmul.S
new file mode 100644
index 000..4ebd5a66f47
--- /dev/null
+++ b/libgcc/config/arm/eabi/fmul.S
@@ -0,0 +1,215 @@
+/* fmul.S: Thumb-1 optimized 32-bit float multiplication
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_mulsf3
+
+// float __aeabi_fmul(float, float)
+// Returns $r0 after multiplication by $r1.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_fmul .text.sorted.libgcc.fpcore.m.fmul
+FUNC_ALIAS mulsf3 aeabi_fmul
+CFI_START_FUNCTION
+
+// Standard registers, compatible with exception handling.
+push{ rT, lr }
+.cfi_remember_state
+.cfi_remember_state
+.cfi_adjust_cfa_offset 8
+.cfi_rel_offset rT, 0
+.cfi_rel_offset lr, 4
+
+// Save the sign of the result.
+movsrT, r1
+eorsrT, r0
+lsrsrT, #31
+lslsrT, #31
+mov ip, rT
+
+// Set up INF for comparison.
+movsrT, #255
+lslsrT, #24
+
+// Check for multiplication by zero.
+lslsr2, r0, #1
+beq LLSYM(__fmul_zero1)
+
+lslsr3, r1, #1
+beq LLSYM(__fmul_zero2)
+
+// Check for INF/NAN.
+cmp r3, rT
+bhs LLSYM(__fmul_special2)
+
+cmp r2, rT
+bhs LLSYM(__fmul_special1)
+
+// Because neither operand is INF/NAN, the result will be finite.
+// It is now safe to modify the original operand registers.
+lslsr0, #9
+
+// Isolate the first exponent.  When normal, add back the implicit '1'.
+// The result is always aligned with the MSB in bit [31].
+// Subnormal mantissas remain effectively multiplied by 2x relative to
+//  normals, but this works because the weight of a subnormal is -126.
+lsrsr2, #24
+beq LLSYM(__fmul_normalize2)
+addsr0, #1
+rorsr0, r0
+
+LLSYM(__fmul_normalize2):
+// IMPORTANT: exp10i() jumps in here!
+// Repeat for the mantissa of the second operand.
+// Short-circuit when the mantissa is 1.0, as the
+//  first mantissa is already prepared in $r0
+lslsr1, #9
+
+// When normal, add back the implicit '1'.
+lsrsr3, #24
+beq LLSYM(__fmul_go)
+addsr1, #1
+rorsr1, r1
+
+LLSYM(__fmul_go):
+// Calculate the final exponent, relative to bit [30].
+addsrT, r2, r3
+subsrT, #127
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+// Short-circuit on multiplication by powers of 2.
+lslsr3, r0, #1
+beq LLSYM(__fmul_simple1)
+
+lslsr3, r1, #1
+beq LLSYM(__fmul_simple2)
+  #endif
+
+// Save $ip across the call.
+// (Alternatively, could push/pop a separate register,
+//  but the four instructions here are equivally fast)
+//  without imposing on the stack.
+add rT, ip
+
+// 32x32 unsigned multiplication, 64 bit result.
+bl

[PATCH v7 28/34] Import float division from the CM0 library

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/eabi/fdiv.S (__divsf3, __fp_divloopf): New file.
* config/arm/lib1funcs.S: #include eabi/fdiv.S (v6m only).
* config/arm/t-elf (LIB1ASMFUNCS): Added _divsf3 and _fp_divloopf.
---
 libgcc/config/arm/eabi/fdiv.S | 261 ++
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/t-elf   |   2 +
 3 files changed, 264 insertions(+)
 create mode 100644 libgcc/config/arm/eabi/fdiv.S

diff --git a/libgcc/config/arm/eabi/fdiv.S b/libgcc/config/arm/eabi/fdiv.S
new file mode 100644
index 000..a6d73892b6d
--- /dev/null
+++ b/libgcc/config/arm/eabi/fdiv.S
@@ -0,0 +1,261 @@
+/* fdiv.S: Thumb-1 optimized 32-bit float division
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_divsf3
+
+// float __aeabi_fdiv(float, float)
+// Returns $r0 after division by $r1.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_fdiv .text.sorted.libgcc.fpcore.n.fdiv
+FUNC_ALIAS divsf3 aeabi_fdiv
+CFI_START_FUNCTION
+
+// Standard registers, compatible with exception handling.
+push{ rT, lr }
+.cfi_remember_state
+.cfi_remember_state
+.cfi_adjust_cfa_offset 8
+.cfi_rel_offset rT, 0
+.cfi_rel_offset lr, 4
+
+// Save for the sign of the result.
+movsr3, r1
+eorsr3, r0
+lsrsrT, r3, #31
+lslsrT, #31
+mov ip, rT
+
+// Set up INF for comparison.
+movsrT, #255
+lslsrT, #24
+
+// Check for divide by 0.  Automatically catches 0/0.
+lslsr2, r1, #1
+beq LLSYM(__fdiv_by_zero)
+
+// Check for INF/INF, or a number divided by itself.
+lslsr3, #1
+beq LLSYM(__fdiv_equal)
+
+// Check the numerator for INF/NAN.
+eorsr3, r2
+cmp r3, rT
+bhs LLSYM(__fdiv_special1)
+
+// Check the denominator for INF/NAN.
+cmp r2, rT
+bhs LLSYM(__fdiv_special2)
+
+// Check the numerator for zero.
+cmp r3, #0
+beq SYM(__fp_zero)
+
+// No action if the numerator is subnormal.
+//  The mantissa will normalize naturally in the division loop.
+lslsr0, #9
+lsrsr1, r3, #24
+beq LLSYM(__fdiv_denominator)
+
+// Restore the numerator's implicit '1'.
+addsr0, #1
+rorsr0, r0
+
+LLSYM(__fdiv_denominator):
+// The denominator must be normalized and left aligned.
+bl  SYM(__fp_normalize2)
+
+// 25 bits of precision will be sufficient.
+movsrT, #64
+
+// Run division.
+bl  SYM(__fp_divloopf)
+b   SYM(__fp_assemble)
+
+LLSYM(__fdiv_equal):
+  #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+movsr3, #(DIVISION_INF_BY_INF)
+  #endif
+
+// The absolute value of both operands are equal, but not 0.
+// If both operands are INF, create a new NAN.
+cmp r2, rT
+beq SYM(__fp_exception)
+
+  #if defined(TRAP_NANS) && TRAP_NANS
+// If both operands are NAN, return the NAN in $r0.
+bhi SYM(__fp_check_nan)
+  #else
+bhi LLSYM(__fdiv_return)
+  #endif
+
+// Return 1.0f, with appropriate sign.
+movsr0, #127
+lslsr0, #23
+add r0, ip
+
+LLSYM(__fdiv_return):
+pop { rT, pc }
+.cfi_restore_state
+
+LLSYM(__fdiv_special2):
+// The denominator is either INF or NAN, numerator is neither.
+// Also, the denominator is not equal to 0.
+// If the d

[PATCH v7 31/34] Import float<->double conversion from the CM0 library

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/eabi/fcast.S (__aeabi_d2f, __aeabi_f2d): New file.
* config/arm/lib1funcs.S: #include eabi/fcast.S (v6m only).
* config/arm/t-elf (LIB1ASMFUNCS): Added _arm_d2f and _arm_f2d.
---
 libgcc/config/arm/eabi/fcast.S | 256 +
 libgcc/config/arm/lib1funcs.S  |   1 +
 libgcc/config/arm/t-elf|   2 +
 3 files changed, 259 insertions(+)
 create mode 100644 libgcc/config/arm/eabi/fcast.S

diff --git a/libgcc/config/arm/eabi/fcast.S b/libgcc/config/arm/eabi/fcast.S
new file mode 100644
index 000..f0d1373d31a
--- /dev/null
+++ b/libgcc/config/arm/eabi/fcast.S
@@ -0,0 +1,256 @@
+/* fcast.S: Thumb-1 optimized 32- and 64-bit float conversions
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_f2d
+
+// double __aeabi_f2d(float)
+// Converts a single-precision float in $r0 to double-precision in $r1:$r0.
+// Rounding, overflow, and underflow are impossible.
+// INF and ZERO are returned unmodified.
+FUNC_START_SECTION aeabi_f2d .text.sorted.libgcc.fpcore.v.f2d
+FUNC_ALIAS extendsfdf2 aeabi_f2d
+CFI_START_FUNCTION
+
+// Save the sign.
+lsrsr1, r0, #31
+lslsr1, #31
+
+// Set up registers for __fp_normalize2().
+push{ rT, lr }
+.cfi_remember_state
+.cfi_adjust_cfa_offset 8
+.cfi_rel_offset rT, 0
+.cfi_rel_offset lr, 4
+
+// Test for zero.
+lslsr0, #1
+beq LLSYM(__f2d_return)
+
+// Split the exponent and mantissa into separate registers.
+// This is the most efficient way to convert subnormals in the
+//  half-precision form into normals in single-precision.
+// This does add a leading implicit '1' to INF and NAN,
+//  but that will be absorbed when the value is re-assembled.
+movsr2, r0
+bl  SYM(__fp_normalize2) __PLT__
+
+// Set up the exponent bias.  For INF/NAN values, the bias
+//  is 1791 (2047 - 255 - 1), where the last '1' accounts
+//  for the implicit '1' in the mantissa.
+movsr0, #3
+lslsr0, #9
+addsr0, #255
+
+// Test for INF/NAN, promote exponent if necessary
+cmp r2, #255
+beq LLSYM(__f2d_indefinite)
+
+// For normal values, the exponent bias is 895 (1023 - 127 - 1),
+//  which is half of the prepared INF/NAN bias.
+lsrsr0, #1
+
+LLSYM(__f2d_indefinite):
+// Assemble exponent with bias correction.
+addsr2, r0
+lslsr2, #20
+addsr1, r2
+
+// Assemble the high word of the mantissa.
+lsrsr0, r3, #11
+add r1, r0
+
+// Remainder of the mantissa in the low word of the result.
+lslsr0, r3, #21
+
+LLSYM(__f2d_return):
+pop { rT, pc }
+.cfi_restore_state
+
+CFI_END_FUNCTION
+FUNC_END extendsfdf2
+FUNC_END aeabi_f2d
+
+#endif /* L_arm_f2d */
+
+
+#if defined(L_arm_d2f) || defined(L_arm_truncdfsf2)
+
+// HACK: Build two separate implementations:
+//  * __aeabi_d2f() rounds to nearest per traditional IEEE-753 rules.
+//  * __truncdfsf2() rounds towards zero per GCC specification.
+// Presumably, a program will consistently use one ABI or the other,
+//  which means that code size will not be duplicated in practice.
+// Merging two versions with dynamic rounding would be rather hard.
+#ifdef L_arm_truncdfsf2
+  #define D2F_NAME truncdfsf2
+  #define D2F_SECTION .text.sorted.libgcc.fpcore.x.truncdfsf2
+#else
+  #define D2F_NAME aeabi_d2f
+  #define D2F_SECTION .text.sorted.libgcc.fpcore.w.d2f
+#endif
+
+// float __aeabi_d2f(double)
+// Converts a double-precision float in $r1:$r0 to single-precision in $r0.
+/

[PATCH v7 29/34] Import integer-to-float conversion from the CM0 library

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bpabi-lib.h (__floatdisf, __floatundisf):
Remove obsolete RENAME_LIBRARY directives.
* config/arm/eabi/ffloat.S (__aeabi_i2f, __aeabi_l2f, __aeabi_ui2f,
__aeabi_ul2f): New file.
* config/arm/lib1funcs.S: #include eabi/ffloat.S (v6m only).
* config/arm/t-elf (LIB1ASMFUNCS): Added _arm_floatunsisf,
_arm_floatsisf, and _internal_floatundisf.
Moved _arm_floatundisf to the weak function group
---
 libgcc/config/arm/bpabi-lib.h   |   6 -
 libgcc/config/arm/eabi/ffloat.S | 247 
 libgcc/config/arm/lib1funcs.S   |   1 +
 libgcc/config/arm/t-elf |   5 +-
 4 files changed, 252 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/ffloat.S

diff --git a/libgcc/config/arm/bpabi-lib.h b/libgcc/config/arm/bpabi-lib.h
index 26ad5ffbe8b..7dd78d5668f 100644
--- a/libgcc/config/arm/bpabi-lib.h
+++ b/libgcc/config/arm/bpabi-lib.h
@@ -56,9 +56,6 @@
 #ifdef L_floatdidf
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatdidf, l2d)
 #endif
-#ifdef L_floatdisf
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatdisf, l2f)
-#endif
 
 /* These renames are needed on ARMv6M.  Other targets get them from
assembly routines.  */
@@ -71,9 +68,6 @@
 #ifdef L_floatundidf
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatundidf, ul2d)
 #endif
-#ifdef L_floatundisf
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatundisf, ul2f)
-#endif
 
 /* For ARM bpabi, we only want to use a "__gnu_" prefix for the fixed-point
helper functions - not everything in libgcc - in the interests of
diff --git a/libgcc/config/arm/eabi/ffloat.S b/libgcc/config/arm/eabi/ffloat.S
new file mode 100644
index 000..c8bc55a24b6
--- /dev/null
+++ b/libgcc/config/arm/eabi/ffloat.S
@@ -0,0 +1,247 @@
+/* ffixed.S: Thumb-1 optimized integer-to-float conversion
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_floatsisf
+
+// float __aeabi_i2f(int)
+// Converts a signed integer in $r0 to float.
+
+// On little-endian cores (including all Cortex-M), __floatsisf() can be
+//  implemented as below in 5 instructions.  However, it can also be
+//  implemented by prefixing a single instruction to __floatdisf().
+// A memory savings of 4 instructions at a cost of only 2 execution cycles
+//  seems reasonable enough.  Plus, the trade-off only happens in programs
+//  that require both __floatsisf() and __floatdisf().  Programs only using
+//  __floatsisf() always get the smallest version.
+// When the combined version will be provided, this standalone version
+//  must be declared WEAK, so that the combined version can supersede it.
+// '_arm_floatsisf' should appear before '_arm_floatdisf' in LIB1ASMFUNCS.
+// Same parent section as __ul2f() to keep tail call branch within range.
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+WEAK_START_SECTION aeabi_i2f .text.sorted.libgcc.fpcore.p.floatsisf
+WEAK_ALIAS floatsisf aeabi_i2f
+CFI_START_FUNCTION
+
+#else /* !__OPTIMIZE_SIZE__ */
+FUNC_START_SECTION aeabi_i2f .text.sorted.libgcc.fpcore.p.floatsisf
+FUNC_ALIAS floatsisf aeabi_i2f
+CFI_START_FUNCTION
+
+#endif /* !__OPTIMIZE_SIZE__ */
+
+// Save the sign.
+asrsr3, r0, #31
+
+// Absolute value of the input.
+eorsr0, r3
+subsr0, r3
+
+// Sign extension to long long unsigned.
+eorsr1, r1
+b   SYM(__internal_floatundisf_noswap)
+
+CFI_END_FUNCTION
+FUNC_END floatsisf
+FUNC_END aeabi_i2f
+
+#endif /* L_arm_floatsisf */
+
+
+#ifdef L_arm_floatdisf
+
+// float __aeabi_l2f(long long)
+// Converts a signed 64-bit integer in $r1:$r0 to a float in $r0.
+// See build comments for __floatsisf() above.
+// Same parent section as __ul2f() to keep tail call branch within range.
+#if defined(__OPTIMIZE_SIZE

[PATCH v7 30/34] Import float-to-integer conversion from the CM0 library

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/bpabi-lib.h (muldi3): Removed duplicate.
(fixunssfsi) Removed obsolete RENAME_LIBRARY directive.
* config/arm/eabi/ffixed.S (__aeabi_f2iz, __aeabi_f2uiz,
__aeabi_f2lz, __aeabi_f2ulz): New file.
* config/arm/lib1funcs.S: #include eabi/ffixed.S (v6m only).
* config/arm/t-elf (LIB1ASMFUNCS): Added _internal_fixsfdi,
_internal_fixsfsi, _arm_fixsfdi, and _arm_fixunssfdi.
---
 libgcc/config/arm/bpabi-lib.h   |   6 -
 libgcc/config/arm/eabi/ffixed.S | 414 
 libgcc/config/arm/lib1funcs.S   |   1 +
 libgcc/config/arm/t-elf |   4 +
 4 files changed, 419 insertions(+), 6 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/ffixed.S

diff --git a/libgcc/config/arm/bpabi-lib.h b/libgcc/config/arm/bpabi-lib.h
index 7dd78d5668f..6425c1bad2a 100644
--- a/libgcc/config/arm/bpabi-lib.h
+++ b/libgcc/config/arm/bpabi-lib.h
@@ -32,9 +32,6 @@
 #ifdef L_muldi3
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (muldi3, lmul)
 #endif
-#ifdef L_muldi3
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (muldi3, lmul)
-#endif
 #ifdef L_fixdfdi
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (fixdfdi, d2lz) \
   extern DWtype __fixdfdi (DFtype) __attribute__((pcs("aapcs"))); \
@@ -62,9 +59,6 @@
 #ifdef L_fixunsdfsi
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (fixunsdfsi, d2uiz)
 #endif
-#ifdef L_fixunssfsi
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (fixunssfsi, f2uiz)
-#endif
 #ifdef L_floatundidf
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatundidf, ul2d)
 #endif
diff --git a/libgcc/config/arm/eabi/ffixed.S b/libgcc/config/arm/eabi/ffixed.S
new file mode 100644
index 000..61c8a0fe1fd
--- /dev/null
+++ b/libgcc/config/arm/eabi/ffixed.S
@@ -0,0 +1,414 @@
+/* ffixed.S: Thumb-1 optimized float-to-integer conversion
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+// The implementation of __aeabi_f2uiz() expects to tail call __internal_f2iz()
+//  with the flags register set for unsigned conversion.  The __internal_f2iz()
+//  symbol itself is unambiguous, but there is a remote risk that the linker
+//  will prefer some other symbol in place of __aeabi_f2iz().  Importing an
+//  archive file that exports __aeabi_f2iz() will throw an error in this case.
+// As a workaround, this block configures __aeabi_f2iz() for compilation twice.
+// The first version configures __internal_f2iz() as a WEAK standalone symbol,
+//  and the second exports __aeabi_f2iz() and __internal_f2iz() normally.
+// A small bonus: programs only using __aeabi_f2uiz() will be slightly smaller.
+// '_internal_fixsfsi' should appear before '_arm_fixsfsi' in LIB1ASMFUNCS.
+#if defined(L_arm_fixsfsi) || \
+   (defined(L_internal_fixsfsi) && \
+  !(defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__))
+
+// Subsection ordering within fpcore keeps conditional branches within range.
+#define F2IZ_SECTION .text.sorted.libgcc.fpcore.r.fixsfsi
+
+// int __aeabi_f2iz(float)
+// Converts a float in $r0 to signed integer, rounding toward 0.
+// Values out of range are forced to either INT_MAX or INT_MIN.
+// NAN becomes zero.
+#ifdef L_arm_fixsfsi
+FUNC_START_SECTION aeabi_f2iz F2IZ_SECTION
+FUNC_ALIAS fixsfsi aeabi_f2iz
+CFI_START_FUNCTION
+#endif
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+// Flag for unsigned conversion.
+movsr1, #33
+b   SYM(__internal_fixsfdi)
+
+  #else /* !__OPTIMIZE_SIZE__ */
+
+#ifdef L_arm_fixsfsi
+// Flag for signed conversion.
+movsr3, #1
+
+// [unsigned] int internal_f2iz(float, int)
+// Internal function expects a boolean flag in $r1.
+// If the boolean flag is 0, the result is unsigned.
+// If the boolean flag is 1, the result is signed.
+FUNC_ENTRY internal_f2iz
+
+#else /* L_internal_fixsfsi */
+WEAK_STAR

[PATCH v7 32/34] Import float<->__fp16 conversion from the CM0 library

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

* config/arm/eabi/fcast.S (__aeabi_h2f, __aeabi_f2h): Added functions.
* config/arm/fp16 (__gnu_f2h_ieee, __gnu_h2f_ieee, 
__gnu_f2h_alternative,
__gnu_h2f_alternative): Disable build for v6m multilibs.
* config/arm/t-bpabi (LIB1ASMFUNCS): Added _aeabi_f2h_ieee,
_aeabi_h2f_ieee, _aeabi_f2h_alt, and _aeabi_h2f_alt (v6m only).
---
 libgcc/config/arm/eabi/fcast.S | 277 +
 libgcc/config/arm/fp16.c   |   4 +
 libgcc/config/arm/t-bpabi  |   7 +
 3 files changed, 288 insertions(+)

diff --git a/libgcc/config/arm/eabi/fcast.S b/libgcc/config/arm/eabi/fcast.S
index f0d1373d31a..09876a95767 100644
--- a/libgcc/config/arm/eabi/fcast.S
+++ b/libgcc/config/arm/eabi/fcast.S
@@ -254,3 +254,280 @@ FUNC_END D2F_NAME
 
 #endif /* L_arm_d2f || L_arm_truncdfsf2 */
 
+
+#if defined(L_aeabi_h2f_ieee) || defined(L_aeabi_h2f_alt)
+
+#ifdef L_aeabi_h2f_ieee
+  #define H2F_NAME aeabi_h2f
+  #define H2F_ALIAS gnu_h2f_ieee
+#else
+  #define H2F_NAME aeabi_h2f_alt
+  #define H2F_ALIAS gnu_h2f_alternative
+#endif
+
+// float __aeabi_h2f(short hf)
+// float __aeabi_h2f_alt(short hf)
+// Converts a half-precision float in $r0 to single-precision.
+// Rounding, overflow, and underflow conditions are impossible.
+// In IEEE mode, INF, ZERO, and NAN are returned unmodified.
+FUNC_START_SECTION H2F_NAME .text.sorted.libgcc.h2f
+FUNC_ALIAS H2F_ALIAS H2F_NAME
+CFI_START_FUNCTION
+
+// Set up registers for __fp_normalize2().
+push{ rT, lr }
+.cfi_remember_state
+.cfi_adjust_cfa_offset 8
+.cfi_rel_offset rT, 0
+.cfi_rel_offset lr, 4
+
+// Save the mantissa and exponent.
+lslsr2, r0, #17
+
+// Isolate the sign.
+lsrsr0, #15
+lslsr0, #31
+
+// Align the exponent at bit[24] for normalization.
+// If zero, return the original sign.
+lsrsr2, #3
+
+  #ifdef __HAVE_FEATURE_IT
+do_it   eq
+RETc(eq)
+  #else
+beq LLSYM(__h2f_return)
+  #endif
+
+// Split the exponent and mantissa into separate registers.
+// This is the most efficient way to convert subnormals in the
+//  half-precision form into normals in single-precision.
+// This does add a leading implicit '1' to INF and NAN,
+//  but that will be absorbed when the value is re-assembled.
+bl  SYM(__fp_normalize2) __PLT__
+
+   #ifdef L_aeabi_h2f_ieee
+// Set up the exponent bias.  For INF/NAN values, the bias is 223,
+//  where the last '1' accounts for the implicit '1' in the mantissa.
+addsr2, #(255 - 31 - 1)
+
+// Test for INF/NAN.
+cmp r2, #254
+
+  #ifdef __HAVE_FEATURE_IT
+do_it   ne
+  #else
+beq LLSYM(__h2f_assemble)
+  #endif
+
+// For normal values, the bias should have been 111.
+// However, this offset must be adjusted per the INF check above.
+ IT(sub,ne) r2, #((255 - 31 - 1) - (127 - 15 - 1))
+
+#else /* L_aeabi_h2f_alt */
+// Set up the exponent bias.  All values are normal.
+addsr2, #(127 - 15 - 1)
+#endif
+
+LLSYM(__h2f_assemble):
+// Combine exponent and sign.
+lslsr2, #23
+addsr0, r2
+
+// Combine mantissa.
+lsrsr3, #8
+add r0, r3
+
+LLSYM(__h2f_return):
+pop { rT, pc }
+.cfi_restore_state
+
+CFI_END_FUNCTION
+FUNC_END H2F_NAME
+FUNC_END H2F_ALIAS
+
+#endif /* L_aeabi_h2f_ieee || L_aeabi_h2f_alt */
+
+
+#if defined(L_aeabi_f2h_ieee) || defined(L_aeabi_f2h_alt)
+
+#ifdef L_aeabi_f2h_ieee
+  #define F2H_NAME aeabi_f2h
+  #define F2H_ALIAS gnu_f2h_ieee
+#else
+  #define F2H_NAME aeabi_f2h_alt
+  #define F2H_ALIAS gnu_f2h_alternative
+#endif
+
+// short __aeabi_f2h(float f)
+// short __aeabi_f2h_alt(float f)
+// Converts a single-precision float in $r0 to half-precision,
+//  rounding to nearest, ties to even.
+// Values out of range are forced to either ZERO or INF.
+// In IEEE mode, the upper 12 bits of a NAN will be preserved.
+FUNC_START_SECTION F2H_NAME .text.sorted.libgcc.f2h
+FUNC_ALIAS F2H_ALIAS F2H_NAME
+CFI_START_FUNCTION
+
+// Set up the sign.
+lsrsr2, r0, #31
+lslsr2, #15
+
+// Save the exponent and mantissa.
+// If ZERO, return the original sign.
+lslsr0, #1
+
+  #ifdef __HAVE_FEATURE_IT
+do_it   ne,t
+addne   r0, r2
+RETc(ne)
+  #else
+beq LLSYM(__f2h_return)
+  #endif
+
+// Isolate the exponent.
+lsrsr1, r0, #24
+
+  #ifdef L_aeabi_f2h_ieee
+// Check for NAN.
+cmp r1, #255
+

[PATCH v7 34/34] Add -mpure-code support to the CM0 functions.

2022-10-31 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel 

Makefile.in (MPURE_CODE): New macro defines __PURE_CODE__.
(gcc_compile): Appended MPURE_CODE.
lib1funcs.S (FUNC_START_SECTION): Set flags for __PURE_CODE__.
clz2.S (__clzsi2): Added -mpure-code compatible instructions.
ctz2.S (__ctzsi2): Same.
popcnt.S (__popcountsi2, __popcountdi2): Same.
---
 libgcc/Makefile.in|  5 -
 libgcc/config/arm/clz2.S  | 25 ++-
 libgcc/config/arm/ctz2.S  | 38 +--
 libgcc/config/arm/lib1funcs.S |  7 ++-
 libgcc/config/arm/popcnt.S| 33 +-
 5 files changed, 98 insertions(+), 10 deletions(-)

diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index 1fe708a93f7..da2da7046cc 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -307,6 +307,9 @@ CRTSTUFF_CFLAGS = -O2 $(GCC_CFLAGS) $(INCLUDES) 
$(MULTILIB_CFLAGS) -g0 \
 # Extra flags to use when compiling crt{begin,end}.o.
 CRTSTUFF_T_CFLAGS =
 
+# Pass the -mpure-code flag into assembly for conditional compilation.
+MPURE_CODE = $(if $(findstring -mpure-code,$(CFLAGS)), -D__PURE_CODE__)
+
 MULTIDIR := $(shell $(CC) $(CFLAGS) -print-multi-directory)
 MULTIOSDIR := $(shell $(CC) $(CFLAGS) -print-multi-os-directory)
 
@@ -316,7 +319,7 @@ inst_slibdir = $(slibdir)$(MULTIOSSUBDIR)
 
 gcc_compile_bare = $(CC) $(INTERNAL_CFLAGS) $(CFLAGS-$(http://www.gnu.org/licenses/>.  */
 
 
+#if defined(L_popcountdi2) || defined(L_popcountsi2)
+
+.macro ldmask reg, temp, value
+#if defined(__PURE_CODE__) && (__PURE_CODE__)
+  #ifdef NOT_ISA_TARGET_32BIT
+movs\reg,   \value
+lsls\temp,  \reg,   #8
+orrs\reg,   \temp
+lsls\temp,  \reg,   #16
+orrs\reg,   \temp
+  #else
+// Assumption: __PURE_CODE__ only support M-profile.
+movw\reg((\value) * 0x101)
+movt\reg((\value) * 0x101)
+  #endif
+#else
+ldr \reg,   =((\value) * 0x1010101)
+#endif
+.endm
+
+#endif
+
+
 #ifdef L_popcountdi2
 
 // int __popcountdi2(int)
@@ -49,7 +72,7 @@ FUNC_START_SECTION popcountdi2 .text.sorted.libgcc.popcountdi2
 
   #else /* !__OPTIMIZE_SIZE__ */
 // Load the one-bit alternating mask.
-ldr r3, =0x
+ldmask  r3, r2, 0x55
 
 // Reduce the second word.
 lsrsr2, r1, #1
@@ -62,7 +85,7 @@ FUNC_START_SECTION popcountdi2 .text.sorted.libgcc.popcountdi2
 subsr0, r2
 
 // Load the two-bit alternating mask.
-ldr r3, =0x
+ldmask  r3, r2, 0x33
 
 // Reduce the second word.
 lsrsr2, r1, #2
@@ -140,7 +163,7 @@ FUNC_ENTRY popcountsi2
   #else /* !__OPTIMIZE_SIZE__ */
 
 // Load the one-bit alternating mask.
-ldr r3, =0x
+ldmask  r3, r2, 0x55
 
 // Reduce the word.
 lsrsr1, r0, #1
@@ -148,7 +171,7 @@ FUNC_ENTRY popcountsi2
 subsr0, r1
 
 // Load the two-bit alternating mask.
-ldr r3, =0x
+ldmask  r3, r2, 0x33
 
 // Reduce the word.
 lsrsr1, r0, #2
@@ -158,7 +181,7 @@ FUNC_ENTRY popcountsi2
 addsr0, r1
 
 // Load the four-bit alternating mask.
-ldr r3, =0x0F0F0F0F
+ldmask  r3, r2, 0x0F
 
 // Reduce the word.
 lsrsr1, r0, #4
-- 
2.34.1

Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

2021-01-06 Thread Daniel Engel

--snip--

On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote:

> 
> Thanks for working on this, Daniel.
> 
> This is clearly stage1 material, so we've got time for a couple of
> iterations to sort things out.

I appreciate your feedback.  I had been hoping that with no regressions
this might still be eligible for stage2.  Christophe never indicated
either way. but the fact that he was looking at it seemed positive.
I thought I would be a couple of weeks faster with this last
iteration, but holidays got in the way.

I actually think your comments below could all be addressable within a
couple of days.  But, I'm not accounting for the review process.

> Firstly, the patch is very large, but contains a large number of
> distinct changes, so it would really benefit from being broken down into
> a number of distinct patches.  This will make reviewing the individual
> changes much more straight-forward.  

I have no context for "large" or "small" with respect to gcc.  This
patch comprises about 30% of a previously-monolithic library that's
been shipping since ~2016 (the rest is libm material).  Other than
(1) the aforementioned change to div0(), (2) a nascent adaptation
for __truncdfsf2() (not enabled), and (3) the gratuitous addition of
the bitwise functions, the library remains pretty much as it was
originally released.

The driving force in the development of this library was small size,
which of course was never possible with the softfp routines.  It's not
half-slow, either, for the limitations of the M0 architecture.   And,
it's IEEE compliant.  But, that means that most of the functions are
highly interconnected.  So, some of it can be broken up as you outline
below, but that last patch is still worth more than half of the total.

I also have ~70k lines of test vectors that seem mostly redundant, but
not completely.  I haven't decided what to do here.  For example, I have
coverage for __aeabi_u/ldivmod, while GCC does not.  If I do anything
with this code it will be in a separate thread.

> I'd suggest:
> 
> 1) Some basic makefile cleanups to ease initial integration - in
> particular where we have things like
> 
> LIB1FUNCS += 
> 
> that this be rewritten with one function per line (and sorted
> alphabetically) - then we can see which functions are being changed in
> subsequent patches.  It makes the Makefile fragments longer, but the
> improvement in clarity for makes this worthwhile.

I know next to nothing about Makefiles, particularly ones as complex as
GCC's.  I was just trying to work with the existing style to avoid
breaking something.  However, I can certainly adopt this suggestion.

> 2) The changes for the existing integer functions - preferably one
> function per patch.
> 
> 3) The new integer functions that you're adding

These wouldn't be too hard to do, but what are the expectations for
testing?  A clean build of GCC takes about 6 hours in my VM, and
regression testing takes about 4 hours per architecture.  You would want
a full regression report for each incremental patch?  I have no idea how
to target regression tests that apply to particular runtime functions
without the risk of missing something.

> 4) The floating-point support.
> 
> Some more general observations:
> 
> - where functions are already in lib1funcs.asm, please leave them there.

I guess I have a different vision here.  I have had a really hard time
following all of the nested #ifdefs in lib1funcs, so I thought it would
be helpful to begin breaking it up into logical units.

The functions removed were all functions for which I had THUMB1
sequences faster/smaller than lib1funcs:  __clzsi2, __clzdi2, __ctzsi2,
__ashrdi3, __lshrdi3, __ashldi3.  In fact, the new THUMB1 of __clzsi2 is
the same number of instructions as the previous ARM/THUMB2 version.

You will find all of the previous ARM versions of these functions merged
into the new files (with attribution) and the same preprocessor
selection path.  So no architecture variant should be any worse off than
before this patch, and some beyond v6m should benefit.

In the future, I think that my versions of __divsi3 and __divdi3 will
prove faster/smaller than the existing THUMB2 versions.  I know that my
float routines are less than half the compiled size of THUMB2 versions
in 'ieee754-sf.S'.  However, I haven't profiled the exact performance
differences so I have left all this work for future patches. (It's also
quite likely that my version can be further-refined with a few judicious
uses of THUMB2 alternatives.)

My long-term vision would be use lib1funcs as an architectural wrapper
distinct from the implementation code.

> - lets avoid having the cm0 subdirectory - in particular we should do
> this when there is existing code for other targets in the same source
> files.  It's OK to have any new files in the main 'arm' directory of the
> source tree - just name the files appropriately if really needed.

Fair point on the name.  In v1 of this patch, all these files were a

Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

2021-01-09 Thread Daniel Engel

On Thu, Jan 7, 2021, at 4:56 AM, Richard Earnshaw wrote:
> On 07/01/2021 00:59, Daniel Engel wrote:
> > --snip--
> > 
> > On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote:
> > 
> >>
> >> Thanks for working on this, Daniel.
> >>
> >> This is clearly stage1 material, so we've got time for a couple of
> >> iterations to sort things out.
> > 
> > I appreciate your feedback.  I had been hoping that with no regressions
> > this might still be eligible for stage2.  Christophe never indicated
> > either way. but the fact that he was looking at it seemed positive.
> > I thought I would be a couple of weeks faster with this last
> > iteration, but holidays got in the way.
> 
> GCC doesn't have a stage 2 any more (historical wart).  We were in
> (late) stage3 when this was first posted, and because of the significant
> impact this might have on not just CM0 but other targets as well, I
> don't think it's something we should try to squeeze in at the last
> minute.  We're now in stage 4, so that is doubly the case.

Of course I meant stage3.  Oops.  I actually thought stage 3 would
continue through next week based on the average historical dates.

It would have been nice to get this feedback when I emailed you a
preview version of this patch (2020-Nov-11).  Christophe's logs have
been very helpful on the technical integration, but it's proving a chore
to go back and re-create some of the intermediate chunks.

Regardless, I still have free time for at least a little while longer to
work on this, so I'll push forward with any further feedback you are
willing to give me.  I have failed to free up any time during the last 2
years to actually work on this during stage1, and I have no guarantee
this coming year will be different.

> 
> Christophe is a very valuable member of our community, but he's not a
> port maintainer and thus cannot really rule on what can go into the
> tools, or when.
> 
> > 
> > I actually think your comments below could all be addressable within a
> > couple of days.  But, I'm not accounting for the review process.
> >  
> >> Firstly, the patch is very large, but contains a large number of
> >> distinct changes, so it would really benefit from being broken down into
> >> a number of distinct patches.  This will make reviewing the individual
> >> changes much more straight-forward.  
> > 
> > I have no context for "large" or "small" with respect to gcc.  This
> > patch comprises about 30% of a previously-monolithic library that's
> > been shipping since ~2016 (the rest is libm material).  Other than
> > (1) the aforementioned change to div0(), (2) a nascent adaptation
> > for __truncdfsf2() (not enabled), and (3) the gratuitous addition of
> > the bitwise functions, the library remains pretty much as it was
> > originally released.
> 
> Large, like many other terms is relative.  For assembler file changes,
> which this is primarily, the overall size can be much smaller and still
> be considered 'large'.
> 
> > 
> > The driving force in the development of this library was small size,
> > which of course was never possible with the softfp routines.  It's not
> > half-slow, either, for the limitations of the M0 architecture.   And,
> > it's IEEE compliant.  But, that means that most of the functions are
> > highly interconnected.  So, some of it can be broken up as you outline
> > below, but that last patch is still worth more than half of the total.
> 
> Nevertheless, having the floating-point code separated out will make
> reviewing more straight forward.  I'll likely need to ask one of our FP
> experts to have a specific look at that part and that will be easier if
> it is disentangled from the other changes.
> > 
> > I also have ~70k lines of test vectors that seem mostly redundant, but
> > not completely.  I haven't decided what to do here.  For example, I have
> > coverage for __aeabi_u/ldivmod, while GCC does not.  If I do anything
> > with this code it will be in a separate thread.
> 
> Publishing the test code, even if it isn't integrated into the GCC
> testsuite would be useful.  Perhaps someone else could then help with that.

Very brute force stuff, not production quality:
<http://danielengel.com/cm0_test_vectors.tgz> (160 kb)

> >> I'd suggest:
> >>
> >> 1) Some basic makefile cleanups to ease initial integration - in
> >> particular where we have things like
> >>
> >> LIB1FUNCS += 
> >>
> >> that this be rewritten with one function

Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

2021-01-09 Thread Daniel Engel

On Sat, Jan 9, 2021, at 5:09 AM, Christophe Lyon wrote:
> On Sat, 9 Jan 2021 at 13:27, Daniel Engel  wrote:
> >
> > On Thu, Jan 7, 2021, at 4:56 AM, Richard Earnshaw wrote:
> > > On 07/01/2021 00:59, Daniel Engel wrote:
> > > > --snip--
> > > >
> > > > On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote:
> > > >
> > > >> -- snip --
> > > >>
> > > >> - finally, your popcount implementations have data in the code segment.
> > > >>  That's going to cause problems when we have compilation options such 
> > > >> as
> > > >> -mpure-code.
> > > >
> > > > I am just following the precedent of existing lib1funcs (e.g. __clz2si).
> > > > If this matters, you'll need to point in the right direction for the
> > > > fix.  I'm not sure it does matter, since these functions are PIC anyway.
> > >
> > > That might be a bug in the clz implementations - Christophe: Any thoughts?
> >
> > __clzsi2() has test coverage in "gcc.c-torture/execute/builtin-bitops-1.c"
> Thanks, I'll have a closer look at why I didn't see problems.
> 
> > The 'clzs' and 'ctz' functions should never have problems.   -mpure-code
> > appears to be valid only when the 'movt' instruction is available, which
> > means that the 'clz' instruction will also be available, so no array loads.
> No, -mpure-code is also supported with v6m.
> 
> > Is the -mpure-code state detectable as a preprocessor flag?  While
> No.
> 
> > 'movw'/'movt' appears to be the canonical solution, I'm not sure it
> > should be the default just because a processor supports Thumb-2.
> >
> > Do users wanting to use -mpure-code recompile the toolchain to avoid
> > constant data in compiled C functions?  I don't think this is the
> > default for the typical toolchain scripts.
> No, users of -mpure-code do not recompile the toolchain.

I won't claim that my use of inline constants is correct.  It was not
hard to find references to high security model processors that block
reading from executable sections.

However, if all of the above is true, I think libgcc as a whole
will have much bigger problems.  I count over 500 other instances
in the disassembled v6m *.a file where functions load pc-relative
data from '.text'.

For example:
* C version of popcount
* __powidf2 (0x3FF0)
* __mulsc3 (0x7F7F)
* Most soft-float functions.

Still not seeing a clear resolution here.  Is it acceptable to use the 

"ldr rD, =const" 

pattern?

Thanks,
Daniel

Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

2021-01-09 Thread Daniel Engel

On Sat, Jan 9, 2021, at 5:09 AM, Christophe Lyon wrote:
> On Sat, 9 Jan 2021 at 13:27, Daniel Engel  wrote:
> >
> > -- snip --
> >
> > To reiterate what I said above, I intend to push forward and incorporate
> > your current recommendations plus any further feedback I may get.  I
> > expect you to say that this doesn't merit inclusion yet, but I'd rather
> > spend the time while I have it.
> >
> > I'll post a patch series for review within the next day or so.
> 
> Here are the results of the validation of your latest version 
> (20210105):
> https://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/r11-5993-g159b0bd9ce263dfb791eff5133b0ca0207201c84-cortex-m0-fplib-20210105.patch/report-build-info.html

Thanks for this.  
 
> "BIG-REGR" just means the regression report is large enough that it's
> provided in compressed form to avoid overloading the browser.
> 
> So it really seems your patch introduces regressions in arm*linux* configs.
> For the 2 arm-none-eabi configs which show regressions (cortex-m0 and
> cortex-m3), the logs seem to indicate some tests timed out, and it's
> possible the server used was overloaded.

Looks like I added _divdi3 in LIB1ASMFUNCS with too much scope.  So the
C implementation gets locked out of the build.  On EABI, _divdi3 is
renamed as __aeabi_ldivmod, so both symbols are always found.  On GNU
EABI, that doesn't happen.

It should be a trivial fix, and I think there are a couple more similar.
I'll integrate this change in the patch series.

> The same applies to the 3 aarch64*elf cases, where the regressions
> seem only caused by timed out; there's no reason your patch would have
> an impact on aarch64.
> (there 5 configs were tested on the same machine, so overload is indeed 
> likely).
> 
> I didn't check why all the ubsan tests now seem to fail, they are in
> the "unstable" category because in the past some of them had some
> randomness.
> I do not see such noise in trunk validation though.

I tried looking up a few of them to analyze.  Couldn't find the names
in the logs (e.g. "pr95810").  Are you sure they actually failed, or just
didn't run?  Regression reports say "ignored".

> Thanks,
> 
> Christophe
> 

Thanks again,
Daniel

Re: [PATCH v4 02/29] Refactor 'clz' functions into a new file.

2021-01-11 Thread Daniel Engel



On Mon, Jan 11, 2021, at 7:39 AM, Richard Earnshaw wrote:
> On 11/01/2021 15:26, Richard Earnshaw wrote:
> > On 11/01/2021 11:10, g...@danielengel.com wrote:
> >> From: Daniel Engel 
> >>
> >> gcc/libgcc/ChangeLog:
> >> 2021-01-07 Daniel Engel 
> >>
> >>* config/arm/lib1funcs.S: Move __clzsi2() and __clzdi2() to
> >>* config/arm/bits/clz2.S: New file.
> > 
> > No, please don't push these down into a subdirectory.  They do not
> > represent a clear subfunctional distinction, so creating a load of disk
> > hierarcy is just confusing.  Just put the code in config/arm/clz.S
> > 
> > Otherwise this is just a re-org, so it's OK.
> 
> Oops, missed that as a new file, this needs to copy over the original
> copyright message.
> 
> Same with the other re-orgs that split code up.

This is not a hard change, just noisy, so I'm checking ... the estimated
lifetime of this particular content is approximately 15 minutes.  There
is a copyright message in 05/29, and similar for the other re-orgs.

> R.
> 
> > 
> > R.
> > 
> >> ---
> >>  libgcc/config/arm/bits/clz2.S | 124 ++
> >>  libgcc/config/arm/lib1funcs.S | 123 +
> >>  2 files changed, 125 insertions(+), 122 deletions(-)
> >>  create mode 100644 libgcc/config/arm/bits/clz2.S
> >>
> >> diff --git a/libgcc/config/arm/bits/clz2.S b/libgcc/config/arm/bits/clz2.S
> >> new file mode 100644
> >> index 000..1c8f10a5b29
> >> --- /dev/null
> >> +++ b/libgcc/config/arm/bits/clz2.S
> >> @@ -0,0 +1,124 @@
> >> +
> >> +#ifdef L_clzsi2
> >> +#ifdef NOT_ISA_TARGET_32BIT
> >> +FUNC_START clzsi2
> >> +  movsr1, #28
> >> +  movsr3, #1
> >> +  lslsr3, r3, #16
> >> +  cmp r0, r3 /* 0x1 */
> >> +  bcc 2f
> >> +  lsrsr0, r0, #16
> >> +  subsr1, r1, #16
> >> +2:lsrsr3, r3, #8
> >> +  cmp r0, r3 /* #0x100 */
> >> +  bcc 2f
> >> +  lsrsr0, r0, #8
> >> +  subsr1, r1, #8
> >> +2:lsrsr3, r3, #4
> >> +  cmp r0, r3 /* #0x10 */
> >> +  bcc 2f
> >> +  lsrsr0, r0, #4
> >> +  subsr1, r1, #4
> >> +2:adr r2, 1f
> >> +  ldrbr0, [r2, r0]
> >> +  addsr0, r0, r1
> >> +  bx lr
> >> +.align 2
> >> +1:
> >> +.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
> >> +  FUNC_END clzsi2
> >> +#else
> >> +ARM_FUNC_START clzsi2
> >> +# if defined (__ARM_FEATURE_CLZ)
> >> +  clz r0, r0
> >> +  RET
> >> +# else
> >> +  mov r1, #28
> >> +  cmp r0, #0x1
> >> +  do_it   cs, t
> >> +  movcs   r0, r0, lsr #16
> >> +  subcs   r1, r1, #16
> >> +  cmp r0, #0x100
> >> +  do_it   cs, t
> >> +  movcs   r0, r0, lsr #8
> >> +  subcs   r1, r1, #8
> >> +  cmp r0, #0x10
> >> +  do_it   cs, t
> >> +  movcs   r0, r0, lsr #4
> >> +  subcs   r1, r1, #4
> >> +  adr r2, 1f
> >> +  ldrbr0, [r2, r0]
> >> +  add r0, r0, r1
> >> +  RET
> >> +.align 2
> >> +1:
> >> +.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
> >> +# endif /* !defined (__ARM_FEATURE_CLZ) */
> >> +  FUNC_END clzsi2
> >> +#endif
> >> +#endif /* L_clzsi2 */
> >> +
> >> +#ifdef L_clzdi2
> >> +#if !defined (__ARM_FEATURE_CLZ)
> >> +
> >> +# ifdef NOT_ISA_TARGET_32BIT
> >> +FUNC_START clzdi2
> >> +  push{r4, lr}
> >> +  cmp xxh, #0
> >> +  bne 1f
> >> +#  ifdef __ARMEB__
> >> +  movsr0, xxl
> >> +  bl  __clzsi2
> >> +  addsr0, r0, #32
> >> +  b 2f
> >> +1:
> >> +  bl  __clzsi2
> >> +#  else
> >> +  bl  __clzsi2
> >> +  addsr0, r0, #32
> >> +  b 2f
> >> +1:
> >> +  movsr0, xxh
> >> +  bl  __clzsi2
> >> +#  endif
> >> +2:
> >> +  pop {r4, pc}
> >> +# else /* NOT_ISA_TARGET_32BIT */
> >> +ARM_FUNC_START clzdi2
> >> +  do_push {r4, lr}
> >> +  cmp xxh, #0
> >> +  bne 1f
> >> +#  ifdef __ARMEB__
> >> +  mov r0, xxl
> >> +  bl  __clzsi2
> >> +  add r0, r0, #32
> >> +  b 2f
> >>

Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

2021-01-11 Thread Daniel Engel

On Mon, Jan 11, 2021, at 8:07 AM, Christophe Lyon wrote:
> On Sat, 9 Jan 2021 at 14:09, Christophe Lyon  
> wrote:
> >
> > On Sat, 9 Jan 2021 at 13:27, Daniel Engel  wrote:
> > >
> > > On Thu, Jan 7, 2021, at 4:56 AM, Richard Earnshaw wrote:
> > > > On 07/01/2021 00:59, Daniel Engel wrote:
> > > > > --snip--
> > > > >
> > > > > On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote:
> > > > > --snip--
> > > > >
> > > > >> - finally, your popcount implementations have data in the code 
> > > > >> segment.
> > > > >>  That's going to cause problems when we have compilation options 
> > > > >> such as
> > > > >> -mpure-code.
> > > > >
> > > > > I am just following the precedent of existing lib1funcs (e.g. 
> > > > > __clz2si).
> > > > > If this matters, you'll need to point in the right direction for the
> > > > > fix.  I'm not sure it does matter, since these functions are PIC 
> > > > > anyway.
> > > >
> > > > That might be a bug in the clz implementations - Christophe: Any 
> > > > thoughts?
> > >
> > > __clzsi2() has test coverage in "gcc.c-torture/execute/builtin-bitops-1.c"
> > Thanks, I'll have a closer look at why I didn't see problems.
> >
> 
> So, that's because the code goes to the .text section (as opposed to
> .text.noread)
> and does not have the PURECODE flag. The compiler takes care of this
> when generating code with -mpure-code.
> And the simulator does not complain because it only checks loads from
> the segment with the PURECODE flag set.
> 
This is far out of my depth, but can something like: 

ifeq (,$(findstring __symbian__,$(shell $(gcc_compile_bare) -dM -E - 
 > > The 'clzs' and 'ctz' functions should never have problems.   -mpure-code
> > > appears to be valid only when the 'movt' instruction is available, which
> > > means that the 'clz' instruction will also be available, so no array 
> > > loads.
> > No, -mpure-code is also supported with v6m.
> >
> > > Is the -mpure-code state detectable as a preprocessor flag?  While
> > No.
> >
> > > 'movw'/'movt' appears to be the canonical solution, I'm not sure it
> > > should be the default just because a processor supports Thumb-2.
> > >
> > > Do users wanting to use -mpure-code recompile the toolchain to avoid
> > > constant data in compiled C functions?  I don't think this is the
> > > default for the typical toolchain scripts.
> > No, users of -mpure-code do not recompile the toolchain.
> >
> > --snip --

>

Re: [PATCH v4 01/29] Add and organize macros.

2021-01-11 Thread Daniel Engel

On Mon, Jan 11, 2021, at 7:21 AM, Richard Earnshaw wrote:
> Some initial comments.
> 
> On 11/01/2021 11:10, g...@danielengel.com wrote:
> > From: Daniel Engel 
> > 
> > These definitions facilitate subsequent patches in this series.
> > 
> > gcc/libgcc/ChangeLog:
> > 2021-01-07 Daniel Engel 
> > 
> > * config/arm/t-elf: Organize functions into logical groups.
> > * config/arm/lib1funcs.S: Add FUNC_START macro variations for
> > weak functions and manual control of the target section;
> > rename THUMB_FUNC_START as THUMB_FUNC_ENTRY for consistency;
> > removed unused macros THUMB_SYNTAX, ARM_SYM_START, SYM_END;
> > removed redundant syntax directives.
> 
> This needs to be re-formatted using the correct ChangeLog style, which
> is in most cases
> 
>   *  (): .
> 
> You can repeat for multiple functions in the same file, but leave off
> the "* " part as long as they are contiguous in the log.

Will do.  Sorry.

> > ---
> >  libgcc/config/arm/lib1funcs.S | 114 +++---
> >  libgcc/config/arm/t-elf   |  55 +---
> >  2 files changed, 110 insertions(+), 59 deletions(-)
> > 
> > diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
> > index c2fcfc503ec..b4541bae791 100644
> > --- a/libgcc/config/arm/lib1funcs.S
> > +++ b/libgcc/config/arm/lib1funcs.S
> > @@ -69,11 +69,13 @@ see the files COPYING3 and COPYING.RUNTIME 
> > respectively.  If not, see
> >  #define TYPE(x) .type SYM(x),function
> >  #define SIZE(x) .size SYM(x), . - SYM(x)
> >  #define LSYM(x) .x
> > +#define LLSYM(x) .L##x
> >  #else
> >  #define __PLT__
> >  #define TYPE(x)
> >  #define SIZE(x)
> >  #define LSYM(x) x
> > +#define LLSYM(x) x
> >  #endif
> 
> I can live with this.
> 
> >  
> >  /* Function end macros.  Variants for interworking.  */
> > @@ -247,6 +249,14 @@ LSYM(Lend_fde):
> >  
> >  #define COND(op1, op2, cond) op1 ## op2 ## cond
> >  
> > +#ifdef __ARM_FEATURE_IT
> > +  #define IT(ins,c) ins##c
> > +#else
> > +  // Assume default Thumb-1 flags-affecting suffix 's'.
> > +  // Almost all instructions require this in unified syntax.
> > +  #define IT(ins,c) ins##s
> 
> This simply doesn't make much sense, at least, not enough to make it
> generally available.  It seems it would be invariably wrong to replace a
> conditional instruction in arm/thumb2 code with a non-conditional flag
> setting instruction in thumb1.  So please don't do this as it's likely
> to be a source of bugs going forwards if folk don't understand exactly
> when it is safe.

I 'm going to push back to use this approach.  I'm a huge believer in
DRY code, and duplicating sequences of 1-4 instructions in a dozen
different places feels unclean.  Duplicating instructions also makes
code comments cumbersome, particularly when doing something tricky in
the conditional block.

I do understand your concern about the __ARM_FEATURE_IT name stepping
into ARM's namespace.  I chose that as a low-mental-friction pattern and
I'd like to keep any replacement similar.  Is __HAVE_FEATURE_IT OK?

This macro currently saves ~10 #ifdef blocks, and I expect that the
number to rise significantly if/when I have time to merge ieee754-sf.S
(One example might be __mulsf3() since it's relatively independent. It's
now 360 bytes in v7m, and 96 bytes on v6m.   I would just need to go
through the new version and a few Thumb-2 optimizations, such as using
the hardware multiply instructions instead of __mulsidi3.)

The safety you want boils down to whether or not __HAVE_FEATURE_IT gets
set correctly.  The do_it() macro seems universally used within libgcc
to support both Thumb-2 and arm compilation of the same code.  I've
defined __HAVE_FEATURE_IT to have the same scope as do_it(), and the 
assembler checks that conditionals are consistent with the previous IT.

While using the IT() macro without do_it() could result in unintended "s"
suffix instructions being emitted for Thumb-1, compilation will fail 
when attempting to build any Thumb-2 multilib.  At that point, adding 
do_it() macro will lead to __HAVE_FEATURE_IT and everything should 
be self-evident. 

I want the macro name to be short, so that it fits within one indent. 
I briefly considered _(), but figured that would be too obtuse.  

I will add the following comment before the macro to clarify: 

/* The IT(c) macro streamlines the construction of short branchless
conditional sequences that support ARM, Thumb-2, and Thumb-1.
It is meant as an extension to the .do_it macro defined above.
Code not written to sup

Re: [PATCH v4 05/29] Import replacement 'clz' functions from CM0 library

2021-01-11 Thread Daniel Engel

On Mon, Jan 11, 2021, at 8:32 AM, Richard Earnshaw wrote:
> A general comment before we start:
> 
> CLZ was added to the Arm ISA in Armv5.  So all subsequent Arm versions
> (and all versions implementing thumb2) will have this instruction.  So
> the only cases where you'll need a fallback are armv6m (and derivatives)
> and pre-armv5 (Arm or thumb1).  So there's no need in your code to try
> to use a synthesized CLZ operation when compiling for thumb2.

If you are referring to the "library formerly known as CM0", none of
that code was written to call clz, either synthesized or instruction.
The instruction just wasn't available to me, and the stack overhead to
call the library was never worth it.  The clz file was in the CM0
library because higher level application code wanted it and we built
with -nostdlib.  There are several optimizations to be made with the clz
instruction before the v6m floating point is suitable for other
architectures, but I don't anticipate ever calling these functions.

If you're referring to __clzsi2() and __clzdi2() at the top of the file
guarded by __ARM_FEATURE_CLZ, that code path is directly descended
from lib1funcs.S.  I just merged into !__ARM_FEATURE_CLZ.  I think the
trivial functions still have to exist within libgcc, even if the
compiler doesn't call them.

> On 11/01/2021 11:10, g...@danielengel.com wrote:
> > From: Daniel Engel 
> > 
> > On architectures with no clz instruction, this version combines __clzdi2()
> > with __clzsi2() into a single object with an efficient tail call.  Also, 
> > this
> > version merges the formerly separate for Thumb and ARM code implementations
> > into a unified instruction sequence.  This change significantly improves the
> > Thumb performance with affecting ARM performance.  Finally, this version 
> > adds
> > a new __OPTIMIZE_SIZE__ build option (using a loop).
> > 
> > On architectures with a clz instruction, functionality is unchanged.
> > 
> > gcc/libgcc/ChangeLog:
> > 2021-01-07 Daniel Engel 
> > 
> > * config/arm/bits/clz2.S: Size-optimized bitwise versions of __clzsi2()
> > and __clzdi2() (i.e. __ARM_FEATURE_CLZ not available).
> > * config/arm/lib1funcs.S: Moved CFI_FUNCTION macros, added 
> > __ARM_FEATURE_IT.
> > * config/arm/t-elf: Move _clzsi2 to new group of weak LIB1ASMFUNCS.
> > ---
> >  libgcc/config/arm/bits/clz2.S | 342 ++
> >  libgcc/config/arm/lib1funcs.S |  25 ++-
> >  libgcc/config/arm/t-elf   |   8 +-
> >  3 files changed, 248 insertions(+), 127 deletions(-)
> > 
> > diff --git a/libgcc/config/arm/bits/clz2.S b/libgcc/config/arm/bits/clz2.S
> > index 1c8f10a5b29..d0a1fbec4d0 100644
> > --- a/libgcc/config/arm/bits/clz2.S
> > +++ b/libgcc/config/arm/bits/clz2.S
> > @@ -1,124 +1,234 @@
> > +/* clz2.S: Cortex M0 optimized 'clz' functions
> > +
> > +   Copyright (C) 2018-2021 Free Software Foundation, Inc> +   Contributed 
> > by Daniel Engel, Senva Inc (g...@danielengel.com)
> > +
> > +   This file is free software; you can redistribute it and/or modify it
> > +   under the terms of the GNU General Public License as published by the
> > +   Free Software Foundation; either version 3, or (at your option) any
> > +   later version.
> > +
> > +   This file is distributed in the hope that it will be useful, but
> > +   WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   General Public License for more details.
> > +
> > +   Under Section 7 of GPL version 3, you are granted additional
> > +   permissions described in the GCC Runtime Library Exception, version
> > +   3.1, as published by the Free Software Foundation.
> > +
> > +   You should have received a copy of the GNU General Public License and
> > +   a copy of the GCC Runtime Library Exception along with this program;
> > +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +
> > +#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
> 
> Writing the test this way is pointless.  Either test for
> __ARM_FEATURE_CLZ being defined, or test for it being non-zero; but not
> both.  C Macros default to a value of zero if not defined.
> 
> In this case #ifdef is just fine - it won't be defined if the
> instruction doesn't exist.
> 
> Similar simplification should be used everywhere else you've used this
> type of construct.

I have been burned multiple times in the past

[PATCH v5 01/33] Add and restructure function declaration macros

2021-01-15 Thread Daniel Engel

Most of these changes support subsequent patches in this series.
Particularly, the FUNC_START macro becomes part of a new macro chain:

  * FUNC_ENTRY  Common global symbol directives
  * FUNC_START_SECTION  FUNC_ENTRY to start a new 
  * FUNC_START  FUNC_START_SECTION <".text">

The effective definition of FUNC_START is unchanged from the previous
version of lib1funcs.  See code comments for detailed usage.

The new names FUNC_ENTRY and FUNC_START_SECTION were chosen specifically
to complement the existing FUNC_START name.  Alternate name patterns are
possible (such as {FUNC_SYMBOL, FUNC_START_SECTION, FUNC_START_TEXT}),
but any change to FUNC_START would require refactoring much of libgcc.

Additionally, a parallel chain of new macros supports weak functions:

  * WEAK_ENTRY
  * WEAK_START_SECTION
  * WEAK_START
  * WEAK_ALIAS

Moving the CFI_* macros earlier in the file scope will increase their
scope for use in additional functions.

gcc/libgcc/ChangeLog:
2021-01-14 Daniel Engel 

* config/arm/lib1funcs.S:
(LLSYM): New macro prefix ".L" for strippable local symbols.
(CFI_START_FUNCTION, CFI_END_FUNCTION): Moved earlier in the file.
(FUNC_ENTRY): New macro for symbols with no ".section" directive.
(WEAK_ENTRY): New macro FUNC_ENTRY + ".weak".
(FUNC_START_SECTION): New macro FUNC_ENTRY with  argument.
(WEAK_START_SECTION): New macro FUNC_START_SECTION + ".weak".
(FUNC_START): Redefined in terms of FUNC_START_SECTION <".text">.
(WEAK_START): New macro FUNC_START + ".weak".
(WEAK_ALIAS): New macro FUNC_ALIAS + ".weak".
(FUNC_END): Moved after FUNC_START macro group.
(THUMB_FUNC_START): Moved near the other *FUNC* macros.
(THUMB_SYNTAX, ARM_SYM_START, SYM_END): Deleted unused macros.
---
 libgcc/config/arm/lib1funcs.S | 109 +-
 1 file changed, 69 insertions(+), 40 deletions(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index c2fcfc503ec..f14662d7e15 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -69,11 +69,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define TYPE(x) .type SYM(x),function
 #define SIZE(x) .size SYM(x), . - SYM(x)
 #define LSYM(x) .x
+#define LLSYM(x) .L##x
 #else
 #define __PLT__
 #define TYPE(x)
 #define SIZE(x)
 #define LSYM(x) x
+#define LLSYM(x) x
 #endif
 
 /* Function end macros.  Variants for interworking.  */
@@ -182,6 +184,16 @@ LSYM(Lend_fde):
 #endif
 .endm
 
+.macro CFI_START_FUNCTION
+   .cfi_startproc
+   .cfi_remember_state
+.endm
+
+.macro CFI_END_FUNCTION
+   .cfi_restore_state
+   .cfi_endproc
+.endm
+
 /* Don't pass dirn, it's there just to get token pasting right.  */
 
 .macro RETLDM  regs=, cond=, unwind=, dirn=ia
@@ -324,10 +336,6 @@ LSYM(Lend_fde):
 .endm
 #endif
 
-.macro FUNC_END name
-   SIZE (__\name)
-.endm
-
 .macro DIV_FUNC_END name signed
cfi_start   __\name, LSYM(Lend_div0)
 LSYM(Ldiv0):
@@ -340,48 +348,76 @@ LSYM(Ldiv0):
FUNC_END \name
 .endm
 
-.macro THUMB_FUNC_START name
-   .globl  SYM (\name)
-   TYPE(\name)
-   .thumb_func
-SYM (\name):
-.endm
-
 /* Function start macros.  Variants for ARM and Thumb.  */
 
 #ifdef __thumb__
 #define THUMB_FUNC .thumb_func
 #define THUMB_CODE .force_thumb
-# if defined(__thumb2__)
-#define THUMB_SYNTAX
-# else
-#define THUMB_SYNTAX
-# endif
 #else
 #define THUMB_FUNC
 #define THUMB_CODE
-#define THUMB_SYNTAX
 #endif
 
+.macro THUMB_FUNC_START name
+   .globl  SYM (\name)
+   TYPE(\name)
+   .thumb_func
+SYM (\name):
+.endm
+
+/* Strong global symbol, ".text" section.
+   The default macro for function declarations. */
 .macro FUNC_START name
-   .text
+   FUNC_START_SECTION \name .text
+.endm
+
+/* Weak global symbol, ".text" section.
+   Use WEAK_* macros to declare a function/object that may be discarded in by
+the linker when another library or object exports the same name.
+   Typically, functions declared with WEAK_* macros implement a subset of
+functionality provided by the overriding definition, and are discarded
+when the full functionality is required. */
+.macro WEAK_START name
+   .weak SYM(__\name)
+   FUNC_START_SECTION \name .text
+.endm
+
+/* Strong global symbol, alternate section.
+   Use the *_START_SECTION macros for declarations that the linker should
+place in a non-defailt section (e.g. ".rodata", ".text.subsection"). */
+.macro FUNC_START_SECTION name section
+   .section \section,"x"
+   .align 0
+   FUNC_ENTRY \name
+.endm
+
+/* Weak global symbol, alternate section. */
+.macro WEAK_START_SECTION name section
+   .weak SYM(__\name)
+   FUNC_START_SECTION \name

[PATCH v5 00/33] libgcc: Thumb-1 Floating-Point Library for Cortex M0

2021-01-15 Thread Daniel Engel

i_fmul) 112+__shared_float  73..97  8   
<= 0.5 ulp
__mulsf3 (OPTIMIZE_SIZE)96+__shared_float   93  8   
<= 0.5 ulp
__divsf3 (__aeabi_fdiv) 132+__shared_float  83..361 8   
<= 0.5 ulp
__divsf3 (OPTIMIZE_SIZE)120+__shared_float  263..3598   
<= 0.5 ulp

__cmpsf2/__lesf2/__ltsf272  33  0   
exact
__eqsf2/__nesf2 4+__cmpsf2  3+__cmpsf2  0   
exact
__gesf2/__gesf2 4+__cmpsf2  3+__cmpsf2  0   
exact
__unordsf2 (__aeabi_fcmpun) 4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmpeq  4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmpne  4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmplt  4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmple  4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmpge  4+__cmpsf2  3+__cmpsf2  0   
exact

__floatundisf (__aeabi_ul2f)14+__shared_float   40..81  8   
<= 0.5 ulp
__floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40..237 8   
<= 0.5 ulp
__floatunsisf (__aeabi_ui2f)0+__floatundisf 1+__floatundisf 8   
<= 0.5 ulp
__floatdisf (__aeabi_l2f)   14+__floatundisf7+__floatundisf 8   
<= 0.5 ulp
__floatsisf (__aeabi_i2f)   0+__floatdisf   1+__floatdisf   8   
<= 0.5 ulp

__fixsfdi (__aeabi_f2lz)74  27..33  0   
exact
__fixunssfdi (__aeabi_f2ulz)4+__fixsfdi 3+__fixsfdi 0   
exact
__fixsfsi (__aeabi_f2iz)52  19  0   
exact
__fixsfsi (OPTIMIZE_SIZE)   4+__fixsfdi 3+__fixsfdi 0   
exact
__fixunssfsi (__aeabi_f2uiz)4+__fixsfsi 3+__fixsfsi 0   
exact

__extendsfdf2 (__aeabi_f2d) 42+__shared_float   38  8   
exact
__truncsfdf2 (__aeabi_f2d)  88  34  8   
exact
__aeabi_d2f 56+__shared_float   54..58  8   
<= 0.5 ulp
__aeabi_h2f 34+__shared_float   34  8   
exact
__aeabi_f2h 84  23..34  0   
<= 0.5 ulp

Copyright assignment is on file with the FSF.

Thanks,
Daniel Engel


[1] // Test program for size comparison

extern int main (void)
{
volatile int x = 1;
volatile unsigned long long int y = 10;
volatile long long int z = x / y; // 64-bit division

volatile float a = x; // 32-bit casting
volatile float b = y; // 64 bit casting
volatile float c = z / b; // float division
volatile float d = a + c; // float addition
volatile float e = c * b; // float multiplication
volatile float f = d - e - c; // float subtraction

if (f != c) // float comparison
y -= (long long int)d; // float casting
}

[2] http://danielengel.com/cm0_test_vectors.tgz
[3] http://www.netlib.org/fp/ucbtest.tgz
[4] http://www.jhauser.us/arithmetic/TestFloat.html
[5] http://win-www.uia.ac.be/u/cant/ieeecc754.html

[PATCH v5 02/33] Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY

2021-01-15 Thread Daniel Engel

Since THUMB_FUNC_START does not insert the ".text" directive, it aligns
more closely with the new FUNC_ENTRY maro and is renamed accordingly.

THUMB_FUNC_START usage has been universally synonymous with the
".force_thumb" directive, so this is now folded into the definition.
Usage of ".force_thumb" and ".thumb_func" is now tightly coupled
throughout the "arm" subdirectory.

gcc/libgcc/ChangeLog:
2021-01-14 Daniel Engel 

* config/arm/lib1funcs.S: (THUMB_FUNC_START): Renamed to ...
(THUMB_FUNC_ENTRY): for consistency; also added ".force_thumb".
(_call_via_r0): Removed redundant preceding ".force_thumb".
(__gnu_thumb1_case_sqi, __gnu_thumb1_case_uqi, __gnu_thumb1_case_shi,
__gnu_thumb1_case_si): Removed redundant ".force_thumb" and ".syntax".
---
 libgcc/config/arm/lib1funcs.S | 32 +++-
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index f14662d7e15..65d070d8178 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -358,10 +358,11 @@ LSYM(Ldiv0):
 #define THUMB_CODE
 #endif
 
-.macro THUMB_FUNC_START name
+.macro THUMB_FUNC_ENTRY name
.globl  SYM (\name)
TYPE(\name)
.thumb_func
+   .force_thumb
 SYM (\name):
 .endm
 
@@ -1944,10 +1945,9 @@ ARM_FUNC_START ctzsi2

.text
.align 0
-.force_thumb
 
 .macro call_via register
-   THUMB_FUNC_START _call_via_\register
+   THUMB_FUNC_ENTRY _call_via_\register
 
bx  \register
nop
@@ -2030,7 +2030,7 @@ _arm_return_r11:
 .macro interwork_with_frame frame, register, name, return
.code   16
 
-   THUMB_FUNC_START \name
+   THUMB_FUNC_ENTRY \name
 
bx  pc
nop
@@ -2047,7 +2047,7 @@ _arm_return_r11:
 .macro interwork register
.code   16
 
-   THUMB_FUNC_START _interwork_call_via_\register
+   THUMB_FUNC_ENTRY _interwork_call_via_\register
 
bx  pc
nop
@@ -2084,7 +2084,7 @@ LSYM(Lchange_\register):
/* The LR case has to be handled a little differently...  */
.code 16
 
-   THUMB_FUNC_START _interwork_call_via_lr
+   THUMB_FUNC_ENTRY _interwork_call_via_lr
 
bx  pc
nop
@@ -2112,9 +2112,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_sqi
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_sqi
push{r1}
mov r1, lr
lsrsr1, r1, #1
@@ -2131,9 +2129,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_uqi
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_uqi
push{r1}
mov r1, lr
lsrsr1, r1, #1
@@ -2150,9 +2146,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_shi
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_shi
push{r0, r1}
mov r1, lr
lsrsr1, r1, #1
@@ -2170,9 +2164,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_uhi
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_uhi
push{r0, r1}
mov r1, lr
lsrsr1, r1, #1
@@ -2190,9 +2182,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_si
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_si
push{r0, r1}
mov r1, lr
adds.n  r1, r1, #2  /* Align to word.  */
-- 
2.25.1

[PATCH v5 03/33] Fix syntax warnings on conditional instructions

2021-01-15 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-14 Daniel Engel 

* config/arm/lib1funcs.S (RETLDM, ARM_DIV_BODY, ARM_MOD_BODY,
_interwork_call_via_lr): Moved condition code after the flags
update specifier "s".
(ARM_FUNC_START, THUMB_LDIV0): Removed redundant ".syntax".
---
 libgcc/config/arm/lib1funcs.S | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 65d070d8178..b8693be8e4f 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -204,7 +204,7 @@ LSYM(Lend_fde):
 # if defined(__thumb2__)
pop\cond{\regs, lr}
 # else
-   ldm\cond\dirn   sp!, {\regs, lr}
+   ldm\dirn\cond   sp!, {\regs, lr}
 # endif
.endif
.ifnc "\unwind", ""
@@ -220,7 +220,7 @@ LSYM(Lend_fde):
 # if defined(__thumb2__)
pop\cond{\regs, pc}
 # else
-   ldm\cond\dirn   sp!, {\regs, pc}
+   ldm\dirn\cond   sp!, {\regs, pc}
 # endif
.endif
 #endif
@@ -292,7 +292,6 @@ LSYM(Lend_fde):
pop {r1, pc}
 
 #elif defined(__thumb2__)
-   .syntax unified
.ifc \signed, unsigned
cbz r0, 1f
mov r0, #0x
@@ -429,7 +428,6 @@ SYM (__\name):
 /* For Thumb-2 we build everything in thumb mode.  */
 .macro ARM_FUNC_START name
FUNC_START \name
-   .syntax unified
 .endm
 #define EQUIV .thumb_set
 .macro  ARM_CALL name
@@ -643,7 +641,7 @@ pc  .reqr15
orrhs   \result,   \result,   \curbit,  lsr #3
cmp \dividend, #0   @ Early termination?
do_it   ne, t
-   movnes  \curbit,   \curbit,  lsr #4 @ No, any more bits to do?
+   movsne  \curbit,   \curbit,  lsr #4 @ No, any more bits to do?
movne   \divisor,  \divisor, lsr #4
bne 1b
 
@@ -745,7 +743,7 @@ pc  .reqr15
subhs   \dividend, \dividend, \divisor, lsr #3
cmp \dividend, #1
mov \divisor, \divisor, lsr #4
-   subges  \order, \order, #4
+   subsge  \order, \order, #4
bge 1b
 
tst \order, #3
@@ -2093,7 +2091,7 @@ LSYM(Lchange_\register):
.globl .Lchange_lr
 .Lchange_lr:
tst lr, #1
-   stmeqdb r13!, {lr, pc}
+   stmdbeq r13!, {lr, pc}
mov ip, lr
adreq   lr, _arm_return
bx  ip
-- 
2.25.1

[PATCH v5 04/33] Reorganize LIB1ASMFUNCS object wrapper macros

2021-01-15 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2021-01-14 Daniel Engel 

* config/arm/t-elf (LIB1ASMFUNCS): Split macros into logical groups.
---
 libgcc/config/arm/t-elf | 66 +
 1 file changed, 53 insertions(+), 13 deletions(-)

diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 9da6cd37054..93ea1cd8f76 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -14,19 +14,59 @@ LIB1ASMFUNCS += _arm_muldf3 _arm_mulsf3
 endif
 endif # !__symbian__
 
-# For most CPUs we have an assembly soft-float implementations.
-# However this is not true for ARMv6M.  Here we want to use the soft-fp C
-# implementation.  The soft-fp code is only build for ARMv6M.  This pulls
-# in the asm implementation for other CPUs.
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func \
-   _call_via_rX _interwork_call_via_rX \
-   _lshrdi3 _ashrdi3 _ashldi3 \
-   _arm_negdf2 _arm_addsubdf3 _arm_muldivdf3 _arm_cmpdf2 _arm_unorddf2 \
-   _arm_fixdfsi _arm_fixunsdfsi \
-   _arm_truncdfsf2 _arm_negsf2 _arm_addsubsf3 _arm_muldivsf3 \
-   _arm_cmpsf2 _arm_unordsf2 _arm_fixsfsi _arm_fixunssfsi \
-   _arm_floatdidf _arm_floatdisf _arm_floatundidf _arm_floatundisf \
-   _clzsi2 _clzdi2 _ctzsi2
+# This pulls in the available assembly function implementations.
+# The soft-fp code is only built for ARMv6M, since there is no
+# assembly implementation here for double-precision values.
+
+
+# Group 1: Integer function objects.
+LIB1ASMFUNCS += \
+   _ashldi3 \
+   _ashrdi3 \
+   _lshrdi3 \
+   _clzdi2 \
+   _clzsi2 \
+   _ctzsi2 \
+   _dvmd_tls \
+   _divsi3 \
+   _modsi3 \
+   _udivsi3 \
+   _umodsi3 \
+
+
+# Group 2: Single precision floating point function objects.
+LIB1ASMFUNCS += \
+   _arm_addsubsf3 \
+   _arm_cmpsf2 \
+   _arm_fixsfsi \
+   _arm_fixunssfsi \
+   _arm_floatdisf \
+   _arm_floatundisf \
+   _arm_muldivsf3 \
+   _arm_negsf2 \
+   _arm_unordsf2 \
+
+
+# Group 3: Double precision floating point function objects.
+LIB1ASMFUNCS += \
+   _arm_addsubdf3 \
+   _arm_cmpdf2 \
+   _arm_fixdfsi \
+   _arm_fixunsdfsi \
+   _arm_floatdidf \
+   _arm_floatundidf \
+   _arm_muldivdf3 \
+   _arm_negdf2 \
+   _arm_truncdfsf2 \
+   _arm_unorddf2 \
+
+
+# Group 4: Miscellaneous function objects.
+LIB1ASMFUNCS += \
+   _bb_init_func \
+   _call_via_rX \
+   _interwork_call_via_rX \
+
 
 # Currently there is a bug somewhere in GCC's alias analysis
 # or scheduling code that is breaking _fpmul_parts in fp-bit.c.
-- 
2.25.1

[PATCH v5 05/33] Add the __HAVE_FEATURE_IT and IT() macros

2021-01-15 Thread Daniel Engel

These macros complement and extend the existing do_it() macro.
Together, they streamline the process of optimizing short branchless
contitional sequences to support ARM, Thumb-2, and Thumb-1.

The inherent architecture limitations of Thumb-1 means that writing
assembly code is somewhat more tedious.  And, while such code will run
unmodified in an ARM or Thumb-2 enfironment, it will lack one of the
key performance optimizations available there.

Initially, the first idea might be to split the an instruction sequence
with #ifdef(s): one path for Thumb-1 and the other for ARM/Thumb-2.
This could suffice if conditional execution optimizations were rare.

However, #ifdef(s) break flow of an algorithm and shift focus to the
architectural differences instead of the similarities.  On functions
with a high percentage of conditional execution, it starts to become
attractive to split everything into distinct architecture-specific
function objects -- even when the underlying algorithm is identical.

Additionally, duplicated code and comments (whether an individual
operand, a line, or a larger block) become a future maintenance
liability if the two versions aren't kept in sync.

See code comments for limitations and expecated usage.

gcc/libgcc/ChangeLog:
2021-01-14 Daniel Engel 

(__HAVE_FEATURE_IT, IT): New macros.
---
 libgcc/config/arm/lib1funcs.S | 68 +++
 1 file changed, 68 insertions(+)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index b8693be8e4f..1233b8c0992 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -230,6 +230,7 @@ LSYM(Lend_fde):
ARM and Thumb-2.  However this is only supported by recent gas, so define
a set of macros to allow ARM code on older assemblers.  */
 #if defined(__thumb2__)
+#define __HAVE_FEATURE_IT
 .macro do_it cond, suffix=""
it\suffix   \cond
 .endm
@@ -245,6 +246,9 @@ LSYM(Lend_fde):
\name \dest, \src1, \tmp
 .endm
 #else
+#if !defined(__thumb__)
+#define __HAVE_FEATURE_IT
+#endif
 .macro do_it cond, suffix=""
 .endm
 .macro shift1 op, arg0, arg1, arg2
@@ -259,6 +263,70 @@ LSYM(Lend_fde):
 
 #define COND(op1, op2, cond) op1 ## op2 ## cond
 
+
+/* The IT() macro streamlines the construction of short branchless contitional
+sequences that support ARM, Thumb-2, and Thumb-1.  It is intended as an
+extension to the .do_it macro defined above.  Code not written with the
+intent to support Thumb-1 need not use IT().
+
+   IT()'s main advantage is the minimization of syntax differences.  Unified
+functions can support Thumb-1 without imposiing an undue performance
+penalty on ARM and Thumb-2.  Writing code without duplicate instructions
+and operands keeps the high level function flow clearer and should reduce
+the incidence of maintenance bugs.
+
+   Where conditional execution is supported by ARM and Thumb-2, the specified
+instruction compiles with the conditional suffix 'c'.
+
+   Where Thumb-1 and v6m do not support IT, the given instruction compiles
+with the standard unified syntax suffix "s", and a preceding branch
+instruction is required to implement conditional behavior.
+
+   (Aside: The Thumb-1 "s"-suffix pattern is somewhat simplistic, since it
+does not support 'cmp' or 'tst' with a non-"s" suffix.  It also appends
+"s" to 'mov' and 'add' with high register operands which are otherwise
+legal on v6m.  Use of IT() will result in a compiler error for all of
+these exceptional cases, and a full #ifdef code split will be required.
+However, it is unlikely that code written with Thumb-1 compatibility
+in mind will use such patterns, so IT() still promises a good value.)
+
+   Typical if/then/else usage is:
+
+#ifdef __HAVE_FEATURE_IT
+// ARM and Thumb-2 'true' condition.
+do_it   c,  tee
+#else
+// Thumb-1 'false' condition.  This must be opposite the
+//  sense of the ARM and Thumb-2 condition, since the
+//  branch is taken to skip the 'true' instruction block.
+b!c else_label
+#endif
+
+// Conditional 'true' execution for all compile modes.
+ IT(ins1,c) op1,op2
+ IT(ins2,c) op1,op2
+
+#ifndef __HAVE_FEATURE_IT
+// Thumb-1 branch to skip the 'else' instruction block.
+// Omitted for if/then usage.
+b   end_label
+#endif
+
+   else_label:
+// Conditional 'false' execution for all compile modes.
+// Omitted for if/then usage.
+ IT(ins3,!c) op1,   op2
+ IT(ins4,!c) op1,   op2
+
+   end_label:
+// Unconditional execution resumes here.
+ */
+#ifdef __HAVE_FEATURE_IT
+  #define IT(ins,c) ins##c
+#else
+  #define IT(ins,c) ins##s
+#endif
+
 #ifdef __ARM_EABI__
 .macro ARM_LDIV0 name signed
cmp r0, #0
-- 
2.25.1

[PATCH v5 06/33] Refactor 'clz' functions into a new file

2021-01-15 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/lib1funcs.S (__clzsi2i, __clzdi2): Moved to ...
* config/arm/clz2.S: New file.
---
 libgcc/config/arm/clz2.S  | 145 ++
 libgcc/config/arm/lib1funcs.S | 123 +---
 2 files changed, 146 insertions(+), 122 deletions(-)
 create mode 100644 libgcc/config/arm/clz2.S

diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
new file mode 100644
index 000..2ad9a81892c
--- /dev/null
+++ b/libgcc/config/arm/clz2.S
@@ -0,0 +1,145 @@
+/* Copyright (C) 1995-2021 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_clzsi2
+#ifdef NOT_ISA_TARGET_32BIT
+FUNC_START clzsi2
+   movsr1, #28
+   movsr3, #1
+   lslsr3, r3, #16
+   cmp r0, r3 /* 0x1 */
+   bcc 2f
+   lsrsr0, r0, #16
+   subsr1, r1, #16
+2: lsrsr3, r3, #8
+   cmp r0, r3 /* #0x100 */
+   bcc 2f
+   lsrsr0, r0, #8
+   subsr1, r1, #8
+2: lsrsr3, r3, #4
+   cmp r0, r3 /* #0x10 */
+   bcc 2f
+   lsrsr0, r0, #4
+   subsr1, r1, #4
+2: adr r2, 1f
+   ldrbr0, [r2, r0]
+   addsr0, r0, r1
+   bx lr
+.align 2
+1:
+.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
+   FUNC_END clzsi2
+#else
+ARM_FUNC_START clzsi2
+# if defined (__ARM_FEATURE_CLZ)
+   clz r0, r0
+   RET
+# else
+   mov r1, #28
+   cmp r0, #0x1
+   do_it   cs, t
+   movcs   r0, r0, lsr #16
+   subcs   r1, r1, #16
+   cmp r0, #0x100
+   do_it   cs, t
+   movcs   r0, r0, lsr #8
+   subcs   r1, r1, #8
+   cmp r0, #0x10
+   do_it   cs, t
+   movcs   r0, r0, lsr #4
+   subcs   r1, r1, #4
+   adr r2, 1f
+   ldrbr0, [r2, r0]
+   add r0, r0, r1
+   RET
+.align 2
+1:
+.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
+# endif /* !defined (__ARM_FEATURE_CLZ) */
+   FUNC_END clzsi2
+#endif
+#endif /* L_clzsi2 */
+
+#ifdef L_clzdi2
+#if !defined (__ARM_FEATURE_CLZ)
+
+# ifdef NOT_ISA_TARGET_32BIT
+FUNC_START clzdi2
+   push{r4, lr}
+   cmp xxh, #0
+   bne 1f
+#  ifdef __ARMEB__
+   movsr0, xxl
+   bl  __clzsi2
+   addsr0, r0, #32
+   b 2f
+1:
+   bl  __clzsi2
+#  else
+   bl  __clzsi2
+   addsr0, r0, #32
+   b 2f
+1:
+   movsr0, xxh
+   bl  __clzsi2
+#  endif
+2:
+   pop {r4, pc}
+# else /* NOT_ISA_TARGET_32BIT */
+ARM_FUNC_START clzdi2
+   do_push {r4, lr}
+   cmp xxh, #0
+   bne 1f
+#  ifdef __ARMEB__
+   mov r0, xxl
+   bl  __clzsi2
+   add r0, r0, #32
+   b 2f
+1:
+   bl  __clzsi2
+#  else
+   bl  __clzsi2
+   add r0, r0, #32
+   b 2f
+1:
+   mov r0, xxh
+   bl  __clzsi2
+#  endif
+2:
+   RETLDM  r4
+   FUNC_END clzdi2
+# endif /* NOT_ISA_TARGET_32BIT */
+
+#else /* defined (__ARM_FEATURE_CLZ) */
+
+ARM_FUNC_START clzdi2
+   cmp xxh, #0
+   do_it   eq, et
+   clzeq   r0, xxl
+   clzne   r0, xxh
+   addeq   r0, r0, #32
+   RET
+   FUNC_END clzdi2
+
+#endif
+#endif /* L_clzdi2 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 1233b8c0992..d92f73ba0c9 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1803,128 +1803,7 @@ LSYM(Lover12):
 
 #endif /* __symbian__ */
 
-#ifdef L_clzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzsi2
-   movsr1, #28
-   movsr3, #1
-   lslsr3, r3, #16
-   cmp r0, r3 /* 0x1 */
-   bcc 2f
-   lsrsr0, r0, #16
-   subsr1, r1, #16
-2: lsrsr3, r3, #8
-   cmp r0, r3 /* #0x100 */
-   bcc 2f
-   lsrsr0, r0, #8
-   subsr1, r1, #8
-2: lsrsr3, r3, #4
-   cmp r0, r3 /* #0x10 */
-   bcc 2f
-   lsrs

[PATCH v5 07/33] Refactor 'ctz' functions into a new file

2021-01-15 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/lib1funcs.S (__ctzsi2): Moved to ...
* config/arm/ctz2.S: New file.
---
 libgcc/config/arm/ctz2.S  | 86 +++
 libgcc/config/arm/lib1funcs.S | 65 +-
 2 files changed, 87 insertions(+), 64 deletions(-)
 create mode 100644 libgcc/config/arm/ctz2.S

diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
new file mode 100644
index 000..8702c9afb94
--- /dev/null
+++ b/libgcc/config/arm/ctz2.S
@@ -0,0 +1,86 @@
+/* Copyright (C) 1995-2021 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_ctzsi2
+#ifdef NOT_ISA_TARGET_32BIT
+FUNC_START ctzsi2
+   negsr1, r0
+   andsr0, r0, r1
+   movsr1, #28
+   movsr3, #1
+   lslsr3, r3, #16
+   cmp r0, r3 /* 0x1 */
+   bcc 2f
+   lsrsr0, r0, #16
+   subsr1, r1, #16
+2: lsrsr3, r3, #8
+   cmp r0, r3 /* #0x100 */
+   bcc 2f
+   lsrsr0, r0, #8
+   subsr1, r1, #8
+2: lsrsr3, r3, #4
+   cmp r0, r3 /* #0x10 */
+   bcc 2f
+   lsrsr0, r0, #4
+   subsr1, r1, #4
+2: adr r2, 1f
+   ldrbr0, [r2, r0]
+   subsr0, r0, r1
+   bx lr
+.align 2
+1:
+.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
+   FUNC_END ctzsi2
+#else
+ARM_FUNC_START ctzsi2
+   rsb r1, r0, #0
+   and r0, r0, r1
+# if defined (__ARM_FEATURE_CLZ)
+   clz r0, r0
+   rsb r0, r0, #31
+   RET
+# else
+   mov r1, #28
+   cmp r0, #0x1
+   do_it   cs, t
+   movcs   r0, r0, lsr #16
+   subcs   r1, r1, #16
+   cmp r0, #0x100
+   do_it   cs, t
+   movcs   r0, r0, lsr #8
+   subcs   r1, r1, #8
+   cmp r0, #0x10
+   do_it   cs, t
+   movcs   r0, r0, lsr #4
+   subcs   r1, r1, #4
+   adr r2, 1f
+   ldrbr0, [r2, r0]
+   sub r0, r0, r1
+   RET
+.align 2
+1:
+.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
+# endif /* !defined (__ARM_FEATURE_CLZ) */
+   FUNC_END ctzsi2
+#endif
+#endif /* L_clzsi2 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index d92f73ba0c9..b1df00ac597 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1804,70 +1804,7 @@ LSYM(Lover12):
 #endif /* __symbian__ */
 
 #include "clz2.S"
-
-#ifdef L_ctzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START ctzsi2
-   negsr1, r0
-   andsr0, r0, r1
-   movsr1, #28
-   movsr3, #1
-   lslsr3, r3, #16
-   cmp r0, r3 /* 0x1 */
-   bcc 2f
-   lsrsr0, r0, #16
-   subsr1, r1, #16
-2: lsrsr3, r3, #8
-   cmp r0, r3 /* #0x100 */
-   bcc 2f
-   lsrsr0, r0, #8
-   subsr1, r1, #8
-2: lsrsr3, r3, #4
-   cmp r0, r3 /* #0x10 */
-   bcc 2f
-   lsrsr0, r0, #4
-   subsr1, r1, #4
-2: adr r2, 1f
-   ldrbr0, [r2, r0]
-   subsr0, r0, r1
-   bx lr
-.align 2
-1:
-.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-   FUNC_END ctzsi2
-#else
-ARM_FUNC_START ctzsi2
-   rsb r1, r0, #0
-   and r0, r0, r1
-# if defined (__ARM_FEATURE_CLZ)
-   clz r0, r0
-   rsb r0, r0, #31
-   RET
-# else
-   mov r1, #28
-   cmp r0, #0x1
-   do_it   cs, t
-   movcs   r0, r0, lsr #16
-   subcs   r1, r1, #16
-   cmp r0, #0x100
-   do_it   cs, t
-   movcs   r0, r0, lsr #8
-   subcs   r1, r1, #8
-   cmp r0, #0x10
-   do_it   cs, t
-   movcs   r0, r0, lsr #4
-   subcs   r1, r1, #4
-   adr r2, 1f
-   ldrbr0, [r2, r0]
-   sub r0, r0, r1
-   RET
-.align 2
-1:
-.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-# endif /* !defined (_

[PATCH v5 08/33] Refactor 64-bit shift functions into a new file

2021-01-15 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/lib1funcs.S (__ashldi3, __ashrdi3, __lshldi3): Moved to ...
* config/arm/eabi/lshift.S: New file.
---
 libgcc/config/arm/eabi/lshift.S | 123 
 libgcc/config/arm/lib1funcs.S   | 103 +-
 2 files changed, 124 insertions(+), 102 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/lshift.S

diff --git a/libgcc/config/arm/eabi/lshift.S b/libgcc/config/arm/eabi/lshift.S
new file mode 100644
index 000..0974a72c377
--- /dev/null
+++ b/libgcc/config/arm/eabi/lshift.S
@@ -0,0 +1,123 @@
+/* Copyright (C) 1995-2021 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_lshrdi3
+
+   FUNC_START lshrdi3
+   FUNC_ALIAS aeabi_llsr lshrdi3
+   
+#ifdef __thumb__
+   lsrsal, r2
+   movsr3, ah
+   lsrsah, r2
+   mov ip, r3
+   subsr2, #32
+   lsrsr3, r2
+   orrsal, r3
+   negsr2, r2
+   mov r3, ip
+   lslsr3, r2
+   orrsal, r3
+   RET
+#else
+   subsr3, r2, #32
+   rsb ip, r2, #32
+   movmi   al, al, lsr r2
+   movpl   al, ah, lsr r3
+   orrmi   al, al, ah, lsl ip
+   mov ah, ah, lsr r2
+   RET
+#endif
+   FUNC_END aeabi_llsr
+   FUNC_END lshrdi3
+
+#endif
+   
+#ifdef L_ashrdi3
+   
+   FUNC_START ashrdi3
+   FUNC_ALIAS aeabi_lasr ashrdi3
+   
+#ifdef __thumb__
+   lsrsal, r2
+   movsr3, ah
+   asrsah, r2
+   subsr2, #32
+   @ If r2 is negative at this point the following step would OR
+   @ the sign bit into all of AL.  That's not what we want...
+   bmi 1f
+   mov ip, r3
+   asrsr3, r2
+   orrsal, r3
+   mov r3, ip
+1:
+   negsr2, r2
+   lslsr3, r2
+   orrsal, r3
+   RET
+#else
+   subsr3, r2, #32
+   rsb ip, r2, #32
+   movmi   al, al, lsr r2
+   movpl   al, ah, asr r3
+   orrmi   al, al, ah, lsl ip
+   mov ah, ah, asr r2
+   RET
+#endif
+
+   FUNC_END aeabi_lasr
+   FUNC_END ashrdi3
+
+#endif
+
+#ifdef L_ashldi3
+
+   FUNC_START ashldi3
+   FUNC_ALIAS aeabi_llsl ashldi3
+   
+#ifdef __thumb__
+   lslsah, r2
+   movsr3, al
+   lslsal, r2
+   mov ip, r3
+   subsr2, #32
+   lslsr3, r2
+   orrsah, r3
+   negsr2, r2
+   mov r3, ip
+   lsrsr3, r2
+   orrsah, r3
+   RET
+#else
+   subsr3, r2, #32
+   rsb ip, r2, #32
+   movmi   ah, ah, lsl r2
+   movpl   ah, al, lsl r3
+   orrmi   ah, ah, al, lsr ip
+   mov al, al, lsl r2
+   RET
+#endif
+   FUNC_END aeabi_llsl
+   FUNC_END ashldi3
+
+#endif
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index b1df00ac597..7ac50230725 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1699,108 +1699,7 @@ LSYM(Lover12):
 
 /* Prevent __aeabi double-word shifts from being produced on SymbianOS.  */
 #ifndef __symbian__
-
-#ifdef L_lshrdi3
-
-   FUNC_START lshrdi3
-   FUNC_ALIAS aeabi_llsr lshrdi3
-   
-#ifdef __thumb__
-   lsrsal, r2
-   movsr3, ah
-   lsrsah, r2
-   mov ip, r3
-   subsr2, #32
-   lsrsr3, r2
-   orrsal, r3
-   negsr2, r2
-   mov r3, ip
-   lslsr3, r2
-   orrsal, r3
-   RET
-#else
-   subsr3, r2, #32
-   rsb ip, r2, #32
-   movmi   al, al, lsr r2
-   movpl   al, ah, lsr r3
-   orrmi   al, al, ah, lsl ip
-   mov ah, ah, lsr r2
-   RET
-#endif
-   FUNC_END aeabi_llsr
-   FUNC_END lshrdi3
-
-#endif
-   
-#ifdef L_ashrdi3
-   
-   FUNC_START ashrdi3
-   FUNC_ALIAS aeabi_lasr ashrdi3
-   
-#ifdef __thumb__
-   lsrsal, r2
-   movsr3, ah
-   asrsah, r2
-   subs

[PATCH v5 09/33] Import 'clz' functions from the CM0 library

2021-01-15 Thread Daniel Engel

On architectures without __ARM_FEATURE_CLZ, this version combines __clzdi2()
with __clzsi2() into a single object with an efficient tail call.  Also, this
version merges the formerly separate Thumb and ARM code implementations
into a unified instruction sequence.  This change significantly improves
Thumb performance without affecting ARM performance.  Finally, this version
adds a new __OPTIMIZE_SIZE__ build option (binary search loop).

There is no change to the code for architectures with __ARM_FEATURE_CLZ.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bits/clz2.S (__clzsi2, __clzdi2): Reduced code size on
architectures without __ARM_FEATURE_CLZ.
* config/arm/t-elf (LIB1ASMFUNCS): Moved _clzsi2 to new weak roup.
---
 libgcc/config/arm/clz2.S | 362 +--
 libgcc/config/arm/t-elf  |   7 +-
 2 files changed, 236 insertions(+), 133 deletions(-)

diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
index 2ad9a81892c..dc246708a82 100644
--- a/libgcc/config/arm/clz2.S
+++ b/libgcc/config/arm/clz2.S
@@ -1,145 +1,243 @@
-/* Copyright (C) 1995-2021 Free Software Foundation, Inc.
+/* clz2.S: Cortex M0 optimized 'clz' functions
 
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (g...@danielengel.com)
 
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
 
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
 
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+
+#ifdef L_clzdi2
+
+// int __clzdi2(long long)
+// Counts leading zero bits in $r1:$r0.
+// Returns the result in $r0.
+FUNC_START_SECTION clzdi2 .text.sorted.libgcc.clz2.clzdi2
+CFI_START_FUNCTION
+
+// Moved here from lib1funcs.S
+cmp xxh,#0
+do_it   eq, et
+clzeq   r0, xxl
+clzne   r0, xxh
+addeq   r0, #32
+RET
+
+CFI_END_FUNCTION
+FUNC_END clzdi2
+
+#endif /* L_clzdi2 */
 
 
 #ifdef L_clzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzsi2
-   movsr1, #28
-   movsr3, #1
-   lslsr3, r3, #16
-   cmp r0, r3 /* 0x1 */
-   bcc 2f
-   lsrsr0, r0, #16
-   subsr1, r1, #16
-2: lsrsr3, r3, #8
-   cmp r0, r3 /* #0x100 */
-   bcc 2f
-   lsrsr0, r0, #8
-   subsr1, r1, #8
-2: lsrsr3, r3, #4
-   cmp r0, r3 /* #0x10 */
-   bcc 2f
-   lsrsr0, r0, #4
-   subsr1, r1, #4
-2: adr r2, 1f
-   ldrbr0, [r2, r0]
-   addsr0, r0, r1
-   bx lr
-.align 2
-1:
-.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
-   FUNC_END clzsi2
-#else
-ARM_FUNC_START clzsi2
-# if defined (__ARM_FEATURE_CLZ)
-   clz r0, r0
-   RET
-# else
-   mov r1, #28
-   cmp r0, #0x1
-   do_it   cs, t
-   movcs   r0, r0, lsr #16
-   subcs   r1, r1, #16
-   cmp r0, #0x100
-   do_it   cs, t
-   movcs   r0, r0, lsr #8
-   subcs   r1, r1, #8
-   cmp r0, #0x10
-   do_it   cs, t
-   movcs   r0, r0, lsr #4
-   subcs   r1, r1, #4
-   adr r2, 1f
-   ldrbr0, [r2, r0]
-   add r0, r0, r1
-   RET
-.align

[PATCH v5 10/33] Import 'ctz' functions from the CM0 library

2021-01-15 Thread Daniel Engel

This version combines __ctzdi2() with __ctzsi2() into a single object with
an efficient tail call.  The former implementation of __ctzdi2() was in C.

On architectures without __ARM_FEATURE_CLZ, this version merges the formerly
separate Thumb and ARM code sequences into a unified instruction sequence.
This change significantly improves Thumb performance without affecting ARM
performance.  Finally, this version adds a new __OPTIMIZE_SIZE__ build option.

On architectures with __ARM_FEATURE_CLZ, __ctzsi2(0) now returns 32.  Formerly,
__ctzsi2(0) would return -1.  Architectures without __ARM_FEATURE_CLZ have
always returned 32, so this change makes the return value consistent.
This change costs 2 extra instructions (branchless).

Likewise on architectures with __ARM_FEATURE_CLZ,  __ctzdi2(0) now returns
64 instead of 31.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bits/ctz2.S (__ctzdi2): Added a new function.
(__clzsi2): Reduced size on architectures without __ARM_FEATURE_CLZ;
changed so __clzsi2(0)=32 on architectures wtih __ARM_FEATURE_CLZ.
* config/arm/t-elf (LIB1ASMFUNCS): Added _ctzdi2;
moved _ctzsi2 to the weak function objects group.
---
 libgcc/config/arm/ctz2.S | 307 +--
 libgcc/config/arm/t-elf  |   3 +-
 2 files changed, 232 insertions(+), 78 deletions(-)

diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
index 8702c9afb94..ee6df6d6d01 100644
--- a/libgcc/config/arm/ctz2.S
+++ b/libgcc/config/arm/ctz2.S
@@ -1,86 +1,239 @@
-/* Copyright (C) 1995-2021 Free Software Foundation, Inc.
+/* ctz2.S: ARM optimized 'ctz' functions
 
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
+   Copyright (C) 2020-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (g...@danielengel.com)
 
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
 
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
 
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
 
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
 
-#ifdef L_ctzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START ctzsi2
-   negsr1, r0
-   andsr0, r0, r1
-   movsr1, #28
-   movsr3, #1
-   lslsr3, r3, #16
-   cmp r0, r3 /* 0x1 */
-   bcc 2f
-   lsrsr0, r0, #16
-   subsr1, r1, #16
-2: lsrsr3, r3, #8
-   cmp r0, r3 /* #0x100 */
-   bcc 2f
-   lsrsr0, r0, #8
-   subsr1, r1, #8
-2: lsrsr3, r3, #4
-   cmp r0, r3 /* #0x10 */
-   bcc 2f
-   lsrsr0, r0, #4
-   subsr1, r1, #4
-2: adr r2, 1f
-   ldrbr0, [r2, r0]
-   subsr0, r0, r1
-   bx lr
-.align 2
-1:
-.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-   FUNC_END ctzsi2
+
+// When the hardware 'ctz' function is available, an efficient version
+//  of __ctzsi2(x) can be created by calculating '31 - __ctzsi2(lsb(x))',
+//  where lsb(x) is 'x' with only the least-significant '1' bit set.
+// The following offset applies to all of the functions in this file.
+#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+  #define CTZ_RESULT_OFFSET 1
 #else
-ARM_FUNC_START ctzsi2
-   rsb r1, r0, #0
-   and r0, r0, r1
-# if defined (__ARM_FEATURE_CLZ)
-   clz r0, r0
-   rsb

[PATCH v5 11/33] Import 64-bit shift functions from the CM0 library

2021-01-15 Thread Daniel Engel

The Thumb versions of these functions are each 1-2 instructions smaller
and faster, and branchless when the IT instruction is available.

The ARM versions were converted to the "xxl/xxh" big-endian register
naming convention, but are otherwise unchanged.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bits/shift.S (__ashldi3, __ashrdi3, __lshldi3):
Reduced code size on Thumb architectures;
updated big-endian register naming convention to "xxl/xxh".
---
 libgcc/config/arm/eabi/lshift.S | 338 +---
 1 file changed, 228 insertions(+), 110 deletions(-)

diff --git a/libgcc/config/arm/eabi/lshift.S b/libgcc/config/arm/eabi/lshift.S
index 0974a72c377..16cf2dcef04 100644
--- a/libgcc/config/arm/eabi/lshift.S
+++ b/libgcc/config/arm/eabi/lshift.S
@@ -1,123 +1,241 @@
-/* Copyright (C) 1995-2021 Free Software Foundation, Inc.
+/* lshift.S: ARM optimized 64-bit integer shift
 
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
 
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
 
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
 
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
 
 
 #ifdef L_lshrdi3
 
-   FUNC_START lshrdi3
-   FUNC_ALIAS aeabi_llsr lshrdi3
-   
-#ifdef __thumb__
-   lsrsal, r2
-   movsr3, ah
-   lsrsah, r2
-   mov ip, r3
-   subsr2, #32
-   lsrsr3, r2
-   orrsal, r3
-   negsr2, r2
-   mov r3, ip
-   lslsr3, r2
-   orrsal, r3
-   RET
-#else
-   subsr3, r2, #32
-   rsb ip, r2, #32
-   movmi   al, al, lsr r2
-   movpl   al, ah, lsr r3
-   orrmi   al, al, ah, lsl ip
-   mov ah, ah, lsr r2
-   RET
-#endif
-   FUNC_END aeabi_llsr
-   FUNC_END lshrdi3
-
-#endif
-   
+// long long __aeabi_llsr(long long, int)
+// Logical shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+FUNC_START_SECTION aeabi_llsr .text.sorted.libgcc.lshrdi3
+FUNC_ALIAS lshrdi3 aeabi_llsr
+CFI_START_FUNCTION
+
+  #if defined(__thumb__) && __thumb__
+
+// Save a copy for the remainder.
+movsr3, xxh
+
+// Assume a simple shift.
+lsrsxxl,r2
+lsrsxxh,r2
+
+// Test if the shift distance is larger than 1 word.
+subsr2, #32
+
+#ifdef __HAVE_FEATURE_IT
+do_it   lo,te
+
+// The remainder is opposite the main shift, (32 - x) bits.
+rsblo   r2, #0
+lsllo   r3, r2
+
+// The remainder shift extends into the hi word.
+lsrhs   r3, r2
+
+#else /* !__HAVE_FEATURE_IT */
+bhs LLSYM(__llsr_large)
+
+// The remainder is opposite the main shift, (32 - x) bits.
+rsbsr2, #0
+lslsr3, r2
+
+// Cancel any remaining shift.
+eorsr2, r2
+
+  LLSYM(__llsr_large):
+// Apply any remaining shift to the hi word.
+lsrsr3, r2
+
+#endif /* !__HAVE_FEATURE_IT */
+
+// Merge remainder and

[PATCH v5 12/33] Import 'clrsb' functions from the CM0 library

2021-01-15 Thread Daniel Engel

This implementation provides an efficient tail call to __clzsi2(), making the
functions rather smaller and faster than the C versions.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bits/clz2.S (__clrsbsi2, __clrsbdi2):
Added new functions.
* config/arm/t-elf (LIB1ASMFUNCS):
Added new function objects _clrsbsi2 and _clrsbdi2).
---
 libgcc/config/arm/clz2.S | 108 ++-
 libgcc/config/arm/t-elf  |   2 +
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
index dc246708a82..5f608c0c2a3 100644
--- a/libgcc/config/arm/clz2.S
+++ b/libgcc/config/arm/clz2.S
@@ -1,4 +1,4 @@
-/* clz2.S: Cortex M0 optimized 'clz' functions
+/* clz2.S: ARM optimized 'clz' and related functions
 
Copyright (C) 2018-2021 Free Software Foundation, Inc.
Contributed by Daniel Engel (g...@danielengel.com)
@@ -23,7 +23,7 @@
<http://www.gnu.org/licenses/>.  */
 
 
-#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+#ifdef __ARM_FEATURE_CLZ
 
 #ifdef L_clzdi2
 
@@ -241,3 +241,107 @@ FUNC_END clzdi2
 
 #endif /* !__ARM_FEATURE_CLZ */
 
+
+#ifdef L_clrsbdi2
+
+// int __clrsbdi2(int)
+// Counts the number of "redundant sign bits" in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and $r3 as scratch space.
+FUNC_START_SECTION clrsbdi2 .text.sorted.libgcc.clz2.clrsbdi2
+CFI_START_FUNCTION
+
+  #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+// Invert negative signs to keep counting zeros.
+asrsr3, xxh,#31
+eorsxxl,r3
+eorsxxh,r3
+
+// Same as __clzdi2(), except that the 'C' flag is pre-calculated.
+// Also, the trailing 'subs', since the last bit is not redundant.
+do_it   eq, et
+clzeq   r0, xxl
+clzne   r0, xxh
+addeq   r0, #32
+subsr0, #1
+RET
+
+  #else  /* !__ARM_FEATURE_CLZ */
+// Result if all the bits in the argument are zero.
+// Set it here to keep the flags clean after 'eors' below.
+movsr2, #31
+
+// Invert negative signs to keep counting zeros.
+asrsr3, xxh,#31
+eorsxxh,r3
+
+#if defined(__ARMEB__) && __ARMEB__
+// If the upper word is non-zero, return '__clzsi2(upper) - 1'.
+bne SYM(__internal_clzsi2)
+
+// The upper word is zero, prepare the lower word.
+movsr0, r1
+eorsr0, r3
+
+#else /* !__ARMEB__ */
+// Save the lower word temporarily.
+// This somewhat awkward construction adds one cycle when the
+//  branch is not taken, but prevents a double-branch.
+eorsr3, r0
+
+// If the upper word is non-zero, return '__clzsi2(upper) - 1'.
+movsr0, r1
+bneSYM(__internal_clzsi2)
+
+// Restore the lower word.
+movsr0, r3
+
+#endif /* !__ARMEB__ */
+
+// The upper word is zero, return '31 + __clzsi2(lower)'.
+addsr2, #32
+b   SYM(__internal_clzsi2)
+
+  #endif /* !__ARM_FEATURE_CLZ */
+
+CFI_END_FUNCTION
+FUNC_END clrsbdi2
+
+#endif /* L_clrsbdi2 */
+
+
+#ifdef L_clrsbsi2
+
+// int __clrsbsi2(int)
+// Counts the number of "redundant sign bits" in $r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+FUNC_START_SECTION clrsbsi2 .text.sorted.libgcc.clz2.clrsbsi2
+CFI_START_FUNCTION
+
+// Invert negative signs to keep counting zeros.
+asrsr2, r0,#31
+eorsr0, r2
+
+  #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+// Count.
+clz r0, r0
+
+// The result for a positive value will always be >= 1.
+// By definition, the last bit is not redundant.
+subsr0, #1
+RET
+
+  #else /* !__ARM_FEATURE_CLZ */
+// Result if all the bits in the argument are zero.
+// By definition, the last bit is not redundant.
+movsr2, #31
+b   SYM(__internal_clzsi2)
+
+  #endif  /* !__ARM_FEATURE_CLZ */
+
+CFI_END_FUNCTION
+FUNC_END clrsbsi2
+
+#endif /* L_clrsbsi2 */
+
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 33b83ac4adf..89071cebe45 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -31,6 +31,8 @@ LIB1ASMFUNCS += \
_ashldi3 \
_ashrdi3 \
_lshrdi3 \
+   _clrsbsi2 \
+   _clrsbdi2 \
_clzdi2 \
_ctzdi2 \
_dvmd_tls \
-- 
2.25.1

[PATCH v5 13/33] Import 'ffs' functions from the CM0 library

2021-01-15 Thread Daniel Engel

This implementation provides an efficient tail call to __clzdi2(), making the
functions rather smaller and faster than the C versions.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bits/ctz2.S (__ffssi2, __ffsdi2): New functions.
* config/arm/t-elf (LIB1ASMFUNCS): Added _ffssi2 and _ffsdi2.
---
 libgcc/config/arm/ctz2.S | 77 +++-
 libgcc/config/arm/t-elf  |  2 ++
 2 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
index ee6df6d6d01..545f8f94d71 100644
--- a/libgcc/config/arm/ctz2.S
+++ b/libgcc/config/arm/ctz2.S
@@ -1,4 +1,4 @@
-/* ctz2.S: ARM optimized 'ctz' functions
+/* ctz2.S: ARM optimized 'ctz' and related functions
 
Copyright (C) 2020-2021 Free Software Foundation, Inc.
Contributed by Daniel Engel (g...@danielengel.com)
@@ -237,3 +237,78 @@ FUNC_END ctzdi2
 
 #endif /* L_ctzsi2 || L_ctzdi2 */
 
+
+#ifdef L_ffsdi2
+
+// int __ffsdi2(int)
+// Return the index of the least significant 1-bit in $r1:r0,
+//  or zero if $r1:r0 is zero.  The least significant bit is index 1.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+// Same section as __ctzsi2() for sake of the tail call branches.
+FUNC_START_SECTION ffsdi2 .text.sorted.libgcc.ctz2.ffsdi2
+CFI_START_FUNCTION
+
+// Simplify branching by assuming a non-zero lower word.
+// For all such, ffssi2(x) == ctzsi2(x) + 1.
+movsr2,#(33 - CTZ_RESULT_OFFSET)
+
+  #if defined(__ARMEB__) && __ARMEB__
+// HACK: Save the upper word in a scratch register.
+movsr3, r0
+
+// Test the lower word.
+movsr0, r1
+bne SYM(__internal_ctzsi2)
+
+// Test the upper word.
+movsr2,#(65 - CTZ_RESULT_OFFSET)
+movsr0, r3
+bne SYM(__internal_ctzsi2)
+
+  #else /* !__ARMEB__ */
+// Test the lower word.
+cmp r0, #0
+bne SYM(__internal_ctzsi2)
+
+// Test the upper word.
+movsr2,#(65 - CTZ_RESULT_OFFSET)
+movsr0, r1
+bne SYM(__internal_ctzsi2)
+
+  #endif /* !__ARMEB__ */
+
+// Upper and lower words are both zero.
+RET
+
+CFI_END_FUNCTION
+FUNC_END ffsdi2
+
+#endif /* L_ffsdi2 */
+
+
+#ifdef L_ffssi2
+
+// int __ffssi2(int)
+// Return the index of the least significant 1-bit in $r0,
+//  or zero if $r0 is zero.  The least significant bit is index 1.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+// Same section as __ctzsi2() for sake of the tail call branches.
+FUNC_START_SECTION ffssi2 .text.sorted.libgcc.ctz2.ffssi2
+CFI_START_FUNCTION
+
+// Simplify branching by assuming a non-zero argument.
+// For all such, ffssi2(x) == ctzsi2(x) + 1.
+movsr2,#(33 - CTZ_RESULT_OFFSET)
+
+// Test for zero, return unmodified.
+cmp r0, #0
+bne SYM(__internal_ctzsi2)
+RET
+
+CFI_END_FUNCTION
+FUNC_END ffssi2
+
+#endif /* L_ffssi2 */
+
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 89071cebe45..346fc766f17 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -35,6 +35,8 @@ LIB1ASMFUNCS += \
_clrsbdi2 \
_clzdi2 \
_ctzdi2 \
+   _ffssi2 \
+   _ffsdi2 \
_dvmd_tls \
_divsi3 \
_modsi3 \
-- 
2.25.1

[PATCH v5 14/33] Import 'parity' functions from the CM0 library

2021-01-15 Thread Daniel Engel

The functional overlap between the single- and double-word functions makes
functions makes this implementation about half the size of the C functions
if both functions are linked in the same application.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/parity.S: New file for __paritysi2/di2().
* config/arm/lib1funcs.S: #include bit/parity.S
* config/arm/t-elf (LIB1ASMFUNCS): Added _paritysi2/di2.
---
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/parity.S| 120 ++
 libgcc/config/arm/t-elf   |   2 +
 3 files changed, 123 insertions(+)
 create mode 100644 libgcc/config/arm/parity.S

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 7ac50230725..600ea2dfdc9 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1704,6 +1704,7 @@ LSYM(Lover12):
 
 #include "clz2.S"
 #include "ctz2.S"
+#include "parity.S"
 
 /*  */
 /* These next two sections are here despite the fact that they contain Thumb 
diff --git a/libgcc/config/arm/parity.S b/libgcc/config/arm/parity.S
new file mode 100644
index 000..45233bc9d8f
--- /dev/null
+++ b/libgcc/config/arm/parity.S
@@ -0,0 +1,120 @@
+/* parity.S: ARM optimized parity functions
+
+   Copyright (C) 2020-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_paritydi2
+
+// int __paritydi2(int)
+// Returns '0' if the number of bits set in $r1:r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+FUNC_START_SECTION paritydi2 .text.sorted.libgcc.paritydi2
+CFI_START_FUNCTION
+
+// Combine the upper and lower words, then fall through.
+// Byte-endianness does not matter for this function.
+eorsr0, r1
+
+#endif /* L_paritydi2 */
+
+
+// The implementation of __paritydi2() tightly couples with __paritysi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __paritydi2() when only using __paritysi2().
+// Therefore, this block configures __paritysi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __paritydi2().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_paritysi2' should appear before '_paritydi2' in LIB1ASMFUNCS.
+#if defined(L_paritysi2) || defined(L_paritydi2)
+
+#ifdef L_paritysi2
+// int __paritysi2(int)
+// Returns '0' if the number of bits set in $r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+// Uses $r2 as scratch space.
+WEAK_START_SECTION paritysi2 .text.sorted.libgcc.paritysi2
+CFI_START_FUNCTION
+
+#else /* L_paritydi2 */
+FUNC_ENTRY paritysi2
+
+#endif
+
+  #if defined(__thumb__) && __thumb__
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+
+// Size optimized: 16 bytes, 40 cycles
+// Speed optimized: 24 bytes, 14 cycles
+movsr2, #16
+
+LLSYM(__parity_loop):
+// Calculate the parity of successively smaller half-words into the 
MSB.
+movsr1, r0
+lslsr1, r2
+eorsr0, r1
+lsrsr2, #1
+bne LLSYM(__parity_loop)
+
+#else /* !__OPTIMIZE_SIZE__ */
+
+// Unroll the loop.  The 'libgcc' reference C implementation replaces
+//  the x2 and the x1 shifts with a constant.  However, since it takes
+//  4 cycles to load, index, and mask the constant result, it doesn't
+//  cost anything to keep shifting (and saves a few bytes).
+lslsr1, r0, #16
+eorsr0, r1
+lslsr1, r0,

[PATCH v5 15/33] Import 'popcnt' functions from the CM0 library

2021-01-15 Thread Daniel Engel

The functional overlap between the single- and double-word functions
makes this implementation about 30% smaller than the C functions
if both functions are linked together in the same appliation.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/popcnt.S (__popcountsi, __popcountdi2): New file.
* config/arm/lib1funcs.S: #include bit/popcnt.S
* config/arm/t-elf (LIB1ASMFUNCS): Add _popcountsi2/di2.
---
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/popcnt.S| 189 ++
 libgcc/config/arm/t-elf   |   2 +
 3 files changed, 192 insertions(+)
 create mode 100644 libgcc/config/arm/popcnt.S

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 600ea2dfdc9..bd84a3e4281 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1705,6 +1705,7 @@ LSYM(Lover12):
 #include "clz2.S"
 #include "ctz2.S"
 #include "parity.S"
+#include "popcnt.S"
 
 /*  */
 /* These next two sections are here despite the fact that they contain Thumb 
diff --git a/libgcc/config/arm/popcnt.S b/libgcc/config/arm/popcnt.S
new file mode 100644
index 000..51b1ed745ee
--- /dev/null
+++ b/libgcc/config/arm/popcnt.S
@@ -0,0 +1,189 @@
+/* popcnt.S: ARM optimized popcount functions
+
+   Copyright (C) 2020-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_popcountdi2
+
+// int __popcountdi2(int)
+// Returns the number of bits set in $r1:$r0.
+// Returns the result in $r0.
+FUNC_START_SECTION popcountdi2 .text.sorted.libgcc.popcountdi2
+CFI_START_FUNCTION
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+// Initialize the result.
+// Compensate for the two extra loop (one for each word)
+//  required to detect zero arguments.
+movsr2, #2
+
+LLSYM(__popcountd_loop):
+// Same as __popcounts_loop below, except for $r1.
+subsr2, #1
+subsr3, r1, #1
+andsr1, r3
+bcs LLSYM(__popcountd_loop)
+
+// Repeat the operation for the second word.
+b   LLSYM(__popcounts_loop)
+
+  #else /* !__OPTIMIZE_SIZE__ */
+// Load the one-bit alternating mask.
+ldr r3, =0x
+
+// Reduce the second word.
+lsrsr2, r1, #1
+andsr2, r3
+subsr1, r2
+
+// Reduce the first word.
+lsrsr2, r0, #1
+andsr2, r3
+subsr0, r2
+
+// Load the two-bit alternating mask.
+ldr r3, =0x
+
+// Reduce the second word.
+lsrsr2, r1, #2
+andsr2, r3
+andsr1, r3
+addsr1, r2
+
+// Reduce the first word.
+lsrsr2, r0, #2
+andsr2, r3
+andsr0, r3
+addsr0, r2
+
+// There will be a maximum of 8 bits in each 4-bit field.
+// Jump into the single word flow to combine and complete.
+b   LLSYM(__popcounts_merge)
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+#endif /* L_popcountdi2 */
+
+
+// The implementation of __popcountdi2() tightly couples with __popcountsi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __popcountdi2() when only using __popcountsi2().
+// Therefore, this block configures __popcountsi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __popcountdi2().  The standalone version 
must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_popcountsi2' should appear b

[PATCH v5 16/33] Refactor Thumb-1 64-bit comparison into a new file

2021-01-15 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bpabi-v6m.S (__aeabi_lcmp, __aeabi_ulcmp): Moved to ...
* config/arm/eabi/lcmp.S: New file.
* config/arm/lib1funcs.S: #include eabi/lcmp.S.
---
 libgcc/config/arm/bpabi-v6m.S | 46 --
 libgcc/config/arm/eabi/lcmp.S | 73 +++
 libgcc/config/arm/lib1funcs.S |  1 +
 3 files changed, 74 insertions(+), 46 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/lcmp.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index 069fcbbf48c..a051c1530a4 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -33,52 +33,6 @@
.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-#ifdef L_aeabi_lcmp
-
-FUNC_START aeabi_lcmp
-   cmp xxh, yyh
-   beq 1f
-   bgt 2f
-   movsr0, #1
-   negsr0, r0
-   RET
-2:
-   movsr0, #1
-   RET
-1:
-   subsr0, xxl, yyl
-   beq 1f
-   bhi 2f
-   movsr0, #1
-   negsr0, r0
-   RET
-2:
-   movsr0, #1
-1:
-   RET
-   FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-   
-#ifdef L_aeabi_ulcmp
-
-FUNC_START aeabi_ulcmp
-   cmp xxh, yyh
-   bne 1f
-   subsr0, xxl, yyl
-   beq 2f
-1:
-   bcs 1f
-   movsr0, #1
-   negsr0, r0
-   RET
-1:
-   movsr0, #1
-2:
-   RET
-   FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
 
 .macro test_div_by_zero signed
cmp yyh, #0
diff --git a/libgcc/config/arm/eabi/lcmp.S b/libgcc/config/arm/eabi/lcmp.S
new file mode 100644
index 000..336db1d398c
--- /dev/null
+++ b/libgcc/config/arm/eabi/lcmp.S
@@ -0,0 +1,73 @@
+/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
+   ARMv6-M and ARMv8-M Baseline like ISA variants.
+
+   Copyright (C) 2006-2020 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_aeabi_lcmp
+
+FUNC_START aeabi_lcmp
+cmp xxh, yyh
+beq 1f
+bgt 2f
+movsr0, #1
+negsr0, r0
+RET
+2:
+movsr0, #1
+RET
+1:
+subsr0, xxl, yyl
+beq 1f
+bhi 2f
+movsr0, #1
+negsr0, r0
+RET
+2:
+movsr0, #1
+1:
+RET
+FUNC_END aeabi_lcmp
+
+#endif /* L_aeabi_lcmp */
+
+#ifdef L_aeabi_ulcmp
+
+FUNC_START aeabi_ulcmp
+cmp xxh, yyh
+bne 1f
+subsr0, xxl, yyl
+beq 2f
+1:
+bcs 1f
+movsr0, #1
+negsr0, r0
+RET
+1:
+movsr0, #1
+2:
+RET
+FUNC_END aeabi_ulcmp
+
+#endif /* L_aeabi_ulcmp */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index bd84a3e4281..5e24d0a6749 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1991,5 +1991,6 @@ LSYM(Lchange_\register):
 #include "bpabi.S"
 #else /* NOT_ISA_TARGET_32BIT */
 #include "bpabi-v6m.S"
+#include "eabi/lcmp.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #endif /* !__symbian__ */
-- 
2.25.1

[PATCH v5 17/33] Import 64-bit comparison from CM0 library

2021-01-15 Thread Daniel Engel

These are 2-5 instructions smaller and just as fast.  Branches are
minimized, which will allow easier adaptation to Thumb-2/ARM mode.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/eabi/lcmp.S (__aeabi_lcmp, __aeabi_ulcmp): Replaced;
add macro configuration to build __cmpdi2() and __ucmpdi2().
* config/arm/t-elf (LIB1ASMFUNCS): Added _cmpdi2 and _ucmpdi2.
---
 libgcc/config/arm/eabi/lcmp.S | 151 +-
 libgcc/config/arm/t-elf   |   2 +
 2 files changed, 112 insertions(+), 41 deletions(-)

diff --git a/libgcc/config/arm/eabi/lcmp.S b/libgcc/config/arm/eabi/lcmp.S
index 336db1d398c..2ac9d178b34 100644
--- a/libgcc/config/arm/eabi/lcmp.S
+++ b/libgcc/config/arm/eabi/lcmp.S
@@ -1,8 +1,7 @@
-/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
-   ARMv6-M and ARMv8-M Baseline like ISA variants.
+/* lcmp.S: Thumb-1 optimized 64-bit integer comparison
 
-   Copyright (C) 2006-2020 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
 
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
@@ -24,50 +23,120 @@
<http://www.gnu.org/licenses/>.  */
 
 
+#if defined(L_aeabi_lcmp) || defined(L_cmpdi2)
+
 #ifdef L_aeabi_lcmp
+  #define LCMP_NAME aeabi_lcmp
+  #define LCMP_SECTION .text.sorted.libgcc.lcmp
+#else
+  #define LCMP_NAME cmpdi2
+  #define LCMP_SECTION .text.sorted.libgcc.cmpdi2
+#endif
+
+// int __aeabi_lcmp(long long, long long)
+// int __cmpdi2(long long, long long)
+// Compares the 64 bit signed values in $r1:$r0 and $r3:$r2.
+// lcmp() returns $r0 = { -1, 0, +1 } for orderings { <, ==, > } respectively.
+// cmpdi2() returns $r0 = { 0, 1, 2 } for orderings { <, ==, > } respectively.
+// Object file duplication assumes typical programs follow one runtime ABI.
+FUNC_START_SECTION LCMP_NAME LCMP_SECTION
+CFI_START_FUNCTION
+
+// Calculate the difference $r1:$r0 - $r3:$r2.
+subsxxl,yyl
+sbcsxxh,yyh
+
+// With $r2 free, create a known offset value without affecting
+//  the N or Z flags.
+// BUG? The originally unified instruction for v6m was 'mov r2, r3'.
+//  However, this resulted in a compile error with -mthumb:
+//"MOV Rd, Rs with two low registers not permitted".
+// Since unified syntax deprecates the "cpy" instruction, shouldn't
+//  there be a backwards-compatible tranlation available?
+cpy r2, r3
+
+// Evaluate the comparison result.
+blt LLSYM(__lcmp_lt)
+
+// The reference offset ($r2 - $r3) will be +2 iff the first
+//  argument is larger, otherwise the offset value remains 0.
+addsr2, #2
+
+// Check for zero (equality in 64 bits).
+// It doesn't matter which register was originally "hi".
+orrsr0,r1
+
+// The result is already 0 on equality.
+beq LLSYM(__lcmp_return)
+
+LLSYM(__lcmp_lt):
+// Create +1 or -1 from the offset value defined earlier.
+addsr3, #1
+subsr0, r2, r3
+
+LLSYM(__lcmp_return):
+  #ifdef L_cmpdi2
+// Offset to the correct output specification.
+addsr0, #1
+  #endif
 
-FUNC_START aeabi_lcmp
-cmp xxh, yyh
-beq 1f
-bgt 2f
-movsr0, #1
-negsr0, r0
-RET
-2:
-movsr0, #1
-RET
-1:
-subsr0, xxl, yyl
-beq 1f
-bhi 2f
-movsr0, #1
-negsr0, r0
-RET
-2:
-movsr0, #1
-1:
 RET
-FUNC_END aeabi_lcmp
 
-#endif /* L_aeabi_lcmp */
+CFI_END_FUNCTION
+FUNC_END LCMP_NAME
+
+#endif /* L_aeabi_lcmp || L_cmpdi2 */
+
+
+#if defined(L_aeabi_ulcmp) || defined(L_ucmpdi2)
 
 #ifdef L_aeabi_ulcmp
+  #define ULCMP_NAME aeabi_ulcmp
+  #define ULCMP_SECTION .text.sorted.libgcc.ulcmp
+#else
+  #define ULCMP_NAME ucmpdi2
+  #define ULCMP_SECTION .text.sorted.libgcc.ucmpdi2
+#endif
+
+// int __aeabi_ulcmp(unsigned long long, unsigned long long)
+// int __ucmpdi2(unsigned long long, unsigned long long)
+// Compares the 64 bit unsigned values in $r1:$r0 and $r3:$r2.
+// ulcmp() returns $r0 = { -1, 0, +1 } for orderings { <, ==, > } respectively.
+// ucmpdi2() returns $r0 = { 0, 1, 2 } for orderings { <, ==, > } respectively.
+// Object file duplication assumes typical programs follow one runtime ABI.
+FUNC_START_SECTION ULCMP_NAME ULCMP_SECTION
+CFI_START_FUNCTION
+
+// Calculate the 'C' flag.
+subsxxl,yyl
+sbcsxxh,yyh
+
+// Capture the carry flg.
+// $r2 wil

[PATCH v5 18/33] Merge Thumb-2 optimizations for 64-bit comparison

2021-01-15 Thread Daniel Engel

This effectively merges support for all architecture variants into a
common function path with appropriate build conditions.
ARM performance is 1-2 instructions faster; Thumb-2 is about 50% faster.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bpabi.S (__aeabi_lcmp, __aeabi_ulcmp): Removed.
* config/arm/eabi/lcmp.S (__aeabi_lcmp, __aeabi_ulcmp): Added
conditional execution on supported architectures (__ARM_FEATURE_IT).
* config/arm/lib1funcs.S: Moved #include scope of eabi/lcmp.S.
---
 libgcc/config/arm/bpabi.S | 42 ---
 libgcc/config/arm/eabi/lcmp.S | 47 ++-
 libgcc/config/arm/lib1funcs.S |  2 +-
 3 files changed, 47 insertions(+), 44 deletions(-)

diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S
index 2cbb67d54ad..4281a2be594 100644
--- a/libgcc/config/arm/bpabi.S
+++ b/libgcc/config/arm/bpabi.S
@@ -34,48 +34,6 @@
.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-#ifdef L_aeabi_lcmp
-
-ARM_FUNC_START aeabi_lcmp
-   cmp xxh, yyh
-   do_it   lt
-   movlt   r0, #-1
-   do_it   gt
-   movgt   r0, #1
-   do_it   ne
-   RETc(ne)
-   subsr0, xxl, yyl
-   do_it   lo
-   movlo   r0, #-1
-   do_it   hi
-   movhi   r0, #1
-   RET
-   FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-   
-#ifdef L_aeabi_ulcmp
-
-ARM_FUNC_START aeabi_ulcmp
-   cmp xxh, yyh
-   do_it   lo
-   movlo   r0, #-1
-   do_it   hi
-   movhi   r0, #1
-   do_it   ne
-   RETc(ne)
-   cmp xxl, yyl
-   do_it   lo
-   movlo   r0, #-1
-   do_it   hi
-   movhi   r0, #1
-   do_it   eq
-   moveq   r0, #0
-   RET
-   FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
-
 .macro test_div_by_zero signed
 /* Tail-call to divide-by-zero handlers which may be overridden by the user,
so unwinding works properly.  */
diff --git a/libgcc/config/arm/eabi/lcmp.S b/libgcc/config/arm/eabi/lcmp.S
index 2ac9d178b34..f1a9c3b8fe0 100644
--- a/libgcc/config/arm/eabi/lcmp.S
+++ b/libgcc/config/arm/eabi/lcmp.S
@@ -46,6 +46,19 @@ FUNC_START_SECTION LCMP_NAME LCMP_SECTION
 subsxxl,yyl
 sbcsxxh,yyh
 
+#ifdef __HAVE_FEATURE_IT
+do_it   lt,t
+
+  #ifdef L_aeabi_lcmp
+movlt   r0,#-1
+  #else
+movlt   r0,#0
+  #endif
+
+// Early return on '<'.
+RETc(lt)
+
+#else /* !__HAVE_FEATURE_IT */
 // With $r2 free, create a known offset value without affecting
 //  the N or Z flags.
 // BUG? The originally unified instruction for v6m was 'mov r2, r3'.
@@ -62,17 +75,27 @@ FUNC_START_SECTION LCMP_NAME LCMP_SECTION
 //  argument is larger, otherwise the offset value remains 0.
 addsr2, #2
 
+#endif
+
 // Check for zero (equality in 64 bits).
 // It doesn't matter which register was originally "hi".
 orrsr0,r1
 
+#ifdef __HAVE_FEATURE_IT
+// The result is already 0 on equality.
+// -1 already returned, so just force +1.
+do_it   ne
+movne   r0, #1
+
+#else /* !__HAVE_FEATURE_IT */
 // The result is already 0 on equality.
 beq LLSYM(__lcmp_return)
 
-LLSYM(__lcmp_lt):
+  LLSYM(__lcmp_lt):
 // Create +1 or -1 from the offset value defined earlier.
 addsr3, #1
 subsr0, r2, r3
+#endif
 
 LLSYM(__lcmp_return):
   #ifdef L_cmpdi2
@@ -111,21 +134,43 @@ FUNC_START_SECTION ULCMP_NAME ULCMP_SECTION
 subsxxl,yyl
 sbcsxxh,yyh
 
+#ifdef __HAVE_FEATURE_IT
+do_it   lo,t
+
+  #ifdef L_aeabi_ulcmp
+movlo   r0, -1
+  #else
+movlo   r0, #0
+  #endif
+
+// Early return on '<'.
+RETc(lo)
+
+#else
 // Capture the carry flg.
 // $r2 will contain -1 if the first value is smaller,
 //  0 if the first value is larger or equal.
 sbcsr2, r2
+#endif
 
 // Check for zero (equality in 64 bits).
 // It doesn't matter which register was originally "hi".
 orrsr0, r1
 
+#ifdef __HAVE_FEATURE_IT
+// The result is already 0 on equality.
+// -1 already returned, so just force +1.
+do_it   ne
+movne   r0, #1
+
+#else /* !__HAVE_FEATURE_IT */
 // The result is already 0 on equality.
 beq LLSYM(__ulcmp_return)
 
 // Assume +1.  If -1 is correct, $r2 will override.
 movsr0, #1
 orrsr0, r2
+#endif
 
 LLSYM(__ulcmp_return):
   #ifdef L_ucmpdi2
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 5e24d0a6749..f41354f811e 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b

[PATCH v5 19/33] Import 32-bit division from the CM0 library

2021-01-15 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-07 Daniel Engel 

* config/arm/eabi/idiv.S: New file for __udivsi3() and __divsi3().
* config/arm/lib1funcs.S: #include eabi/idiv.S (v6m only).
---
 libgcc/config/arm/eabi/idiv.S | 299 ++
 libgcc/config/arm/lib1funcs.S |  19 ++-
 2 files changed, 317 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/arm/eabi/idiv.S

diff --git a/libgcc/config/arm/eabi/idiv.S b/libgcc/config/arm/eabi/idiv.S
new file mode 100644
index 000..7381e8f57a3
--- /dev/null
+++ b/libgcc/config/arm/eabi/idiv.S
@@ -0,0 +1,299 @@
+/* div.S: Thumb-1 size-optimized 32-bit integer division
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifndef __GNUC__
+
+// int __aeabi_idiv0(int)
+// Helper function for division by 0.
+WEAK_START_SECTION aeabi_idiv0 .text.sorted.libgcc.idiv.idiv0
+FUNC_ALIAS cm0_idiv0 aeabi_idiv0
+CFI_START_FUNCTION
+
+  #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+svc #(SVC_DIVISION_BY_ZERO)
+  #endif
+
+RET
+
+CFI_END_FUNCTION
+FUNC_END cm0_idiv0
+FUNC_END aeabi_idiv0
+
+#endif /* !__GNUC__ */
+
+
+#ifdef L_divsi3
+
+// int __aeabi_idiv(int, int)
+// idiv_return __aeabi_idivmod(int, int)
+// Returns signed $r0 after division by $r1.
+// Also returns the signed remainder in $r1.
+// Same parent section as __divsi3() to keep branches within range.
+FUNC_START_SECTION divsi3 .text.sorted.libgcc.idiv.divsi3
+
+#ifndef __symbian__
+  FUNC_ALIAS aeabi_idiv divsi3
+  FUNC_ALIAS aeabi_idivmod divsi3
+#endif
+
+CFI_START_FUNCTION
+
+// Extend signs.
+asrsr2, r0, #31
+asrsr3, r1, #31
+
+// Absolute value of the denominator, abort on division by zero.
+eorsr1, r3
+subsr1, r3
+  #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+beq LLSYM(__idivmod_zero)
+  #else
+beq SYM(__uidivmod_zero)
+  #endif
+
+// Absolute value of the numerator.
+eorsr0, r2
+subsr0, r2
+
+// Keep the sign of the numerator in bit[31] (for the remainder).
+// Save the XOR of the signs in bits[15:0] (for the quotient).
+push{ rT, lr }
+.cfi_remember_state
+.cfi_adjust_cfa_offset 8
+.cfi_rel_offset rT, 0
+.cfi_rel_offset lr, 4
+
+lsrsrT, r3, #16
+eorsrT, r2
+
+// Handle division as unsigned.
+bl  SYM(__uidivmod_nonzero) __PLT__
+
+// Set the sign of the remainder.
+asrsr2, rT, #31
+eorsr1, r2
+subsr1, r2
+
+// Set the sign of the quotient.
+sxthr3, rT
+eorsr0, r3
+subsr0, r3
+
+LLSYM(__idivmod_return):
+pop { rT, pc }
+.cfi_restore_state
+
+  #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+LLSYM(__idivmod_zero):
+// Set up the *div0() parameter specified in the ARM runtime ABI:
+//  * 0 if the numerator is 0,
+//  * Or, the largest value of the type manipulated by the calling
+// division function if the numerator is positive,
+//  * Or, the least value of the type manipulated by the calling
+// division function if the numerator is negative.
+subsr1, r0
+orrsr0, r1
+asrsr0, #31
+lsrsr0, #1
+eorsr0, r2
+
+// At least the __aeabi_idiv0() call is common.
+b   SYM(__uidivmod_zero2)
+  #endif /* PEDANTIC_DIV0 */
+
+CFI_END_FUNCTION
+FUNC_END divsi3
+
+#ifndef __symbian__
+  FUNC_END aeabi_idiv
+  FUNC_END aeabi_idivmod
+#endif 
+
+#endif /* L_divsi3 */
+
+
+#ifdef L_udivsi3
+
+// int __aeabi_uidiv(unsigned int, unsigned int)
+// idiv_return __aeabi_uidivmod(unsigned int, unsigned

[PATCH v5 20/33] Refactor Thumb-1 64-bit division into a new file

2021-01-15 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bpabi-v6m.S (__aeabi_ldivmod/ldivmod): Moved to ...
* config/arm/eabi/ldiv.S: New file.
* config/arm/lib1funcs.S: #include eabi/ldiv.S (v6m only).
---
 libgcc/config/arm/bpabi-v6m.S |  81 -
 libgcc/config/arm/eabi/ldiv.S | 107 ++
 libgcc/config/arm/lib1funcs.S |   1 +
 3 files changed, 108 insertions(+), 81 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/ldiv.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index a051c1530a4..b3dc3bf8f4d 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -34,87 +34,6 @@
 #endif /* __ARM_EABI__ */
 
 
-.macro test_div_by_zero signed
-   cmp yyh, #0
-   bne 7f
-   cmp yyl, #0
-   bne 7f
-   cmp xxh, #0
-   .ifc\signed, unsigned
-   bne 2f
-   cmp xxl, #0
-2:
-   beq 3f
-   movsxxh, #0
-   mvnsxxh, xxh@ 0x
-   movsxxl, xxh
-3:
-   .else
-   blt 6f
-   bgt 4f
-   cmp xxl, #0
-   beq 5f
-4: movsxxl, #0
-   mvnsxxl, xxl@ 0x
-   lsrsxxh, xxl, #1@ 0x7fff
-   b   5f
-6: movsxxh, #0x80
-   lslsxxh, xxh, #24   @ 0x8000
-   movsxxl, #0
-5:
-   .endif
-   @ tailcalls are tricky on v6-m.
-   push{r0, r1, r2}
-   ldr r0, 1f
-   adr r1, 1f
-   addsr0, r1
-   str r0, [sp, #8]
-   @ We know we are not on armv4t, so pop pc is safe.
-   pop {r0, r1, pc}
-   .align  2
-1:
-   .word   __aeabi_ldiv0 - 1b
-7:
-.endm
-
-#ifdef L_aeabi_ldivmod
-
-FUNC_START aeabi_ldivmod
-   test_div_by_zero signed
-
-   push{r0, r1}
-   mov r0, sp
-   push{r0, lr}
-   ldr r0, [sp, #8]
-   bl  SYM(__gnu_ldivmod_helper)
-   ldr r3, [sp, #4]
-   mov lr, r3
-   add sp, sp, #8
-   pop {r2, r3}
-   RET
-   FUNC_END aeabi_ldivmod
-
-#endif /* L_aeabi_ldivmod */
-
-#ifdef L_aeabi_uldivmod
-
-FUNC_START aeabi_uldivmod
-   test_div_by_zero unsigned
-
-   push{r0, r1}
-   mov r0, sp
-   push{r0, lr}
-   ldr r0, [sp, #8]
-   bl  SYM(__udivmoddi4)
-   ldr r3, [sp, #4]
-   mov lr, r3
-   add sp, sp, #8
-   pop {r2, r3}
-   RET
-   FUNC_END aeabi_uldivmod
-   
-#endif /* L_aeabi_uldivmod */
-
 #ifdef L_arm_addsubsf3
 
 FUNC_START aeabi_frsub
diff --git a/libgcc/config/arm/eabi/ldiv.S b/libgcc/config/arm/eabi/ldiv.S
new file mode 100644
index 000..3c8280ef580
--- /dev/null
+++ b/libgcc/config/arm/eabi/ldiv.S
@@ -0,0 +1,107 @@
+/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
+   ARMv6-M and ARMv8-M Baseline like ISA variants.
+
+   Copyright (C) 2006-2020 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+.macro test_div_by_zero signed
+cmp yyh, #0
+bne 7f
+cmp yyl, #0
+bne 7f
+cmp xxh, #0
+.ifc\signed, unsigned
+bne 2f
+cmp xxl, #0
+2:
+beq 3f
+movsxxh, #0
+mvnsxxh, xxh@ 0x
+movsxxl, xxh
+3:
+.else
+blt 6f
+bgt 4f
+cmp xxl, #0
+beq 5f
+4:  movsxxl, #0
+mvnsxxl, xxl@ 0x
+lsrsxxh, xxl, #1@ 0x7fff
+b   5f
+6:  movsxxh, #0x80
+lslsxxh, xxh, #24   @ 0x8000
+movsxxl, #0
+5:
+.endif
+@ tailcalls are tricky on v6-m.
+push{r0, r1, r2}
+ldr r0, 1f
+adr r1, 1f
+addsr0, r1
+str r0, [sp, #8]
+@ We know we are not on armv4t,

[PATCH v5 21/33] Import 64-bit division from the CM0 library

2021-01-15 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bpabi.c: Deleted unused file.
* config/arm/eabi/ldiv.S (__aeabi_ldivmod, __aeabi_uldivmod):
Replaced wrapper functions with a complete implementation.
* config/arm/t-bpabi (LIB2ADD_ST): Removed bpabi.c.
* config/arm/t-elf (LIB1ASMFUNCS): Added _divdi3 and _udivdi3.
---
 libgcc/config/arm/bpabi.c |  42 ---
 libgcc/config/arm/eabi/ldiv.S | 542 +-
 libgcc/config/arm/t-bpabi |   3 +-
 libgcc/config/arm/t-elf   |   9 +
 4 files changed, 474 insertions(+), 122 deletions(-)
 delete mode 100644 libgcc/config/arm/bpabi.c

diff --git a/libgcc/config/arm/bpabi.c b/libgcc/config/arm/bpabi.c
deleted file mode 100644
index bf6ba757964..000
--- a/libgcc/config/arm/bpabi.c
+++ /dev/null
@@ -1,42 +0,0 @@
-/* Miscellaneous BPABI functions.
-
-   Copyright (C) 2003-2021 Free Software Foundation, Inc.
-   Contributed by CodeSourcery, LLC.
-
-   This file is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by the
-   Free Software Foundation; either version 3, or (at your option) any
-   later version.
-
-   This file is distributed in the hope that it will be useful, but
-   WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   General Public License for more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   <http://www.gnu.org/licenses/>.  */
-
-extern long long __divdi3 (long long, long long);
-extern unsigned long long __udivdi3 (unsigned long long, 
-unsigned long long);
-extern long long __gnu_ldivmod_helper (long long, long long, long long *);
-
-
-long long
-__gnu_ldivmod_helper (long long a, 
- long long b, 
- long long *remainder)
-{
-  long long quotient;
-
-  quotient = __divdi3 (a, b);
-  *remainder = a - b * quotient;
-  return quotient;
-}
-
diff --git a/libgcc/config/arm/eabi/ldiv.S b/libgcc/config/arm/eabi/ldiv.S
index 3c8280ef580..c225e5973b2 100644
--- a/libgcc/config/arm/eabi/ldiv.S
+++ b/libgcc/config/arm/eabi/ldiv.S
@@ -1,8 +1,7 @@
-/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
-   ARMv6-M and ARMv8-M Baseline like ISA variants.
+/* ldiv.S: Thumb-1 optimized 64-bit integer division
 
-   Copyright (C) 2006-2020 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
 
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
@@ -24,84 +23,471 @@
<http://www.gnu.org/licenses/>.  */
 
 
-.macro test_div_by_zero signed
-cmp yyh, #0
-bne 7f
-cmp yyl, #0
-bne 7f
-cmp xxh, #0
-.ifc\signed, unsigned
-bne 2f
-cmp xxl, #0
-2:
-beq 3f
-movsxxh, #0
-mvnsxxh, xxh@ 0x
-movsxxl, xxh
-3:
-.else
-blt 6f
-bgt 4f
-cmp xxl, #0
-beq 5f
-4:  movsxxl, #0
-mvnsxxl, xxl@ 0x
-lsrsxxh, xxl, #1@ 0x7fff
-b   5f
-6:  movsxxh, #0x80
-lslsxxh, xxh, #24   @ 0x8000
-movsxxl, #0
-5:
-.endif
-@ tailcalls are tricky on v6-m.
-push{r0, r1, r2}
-ldr r0, 1f
-adr r1, 1f
-addsr0, r1
-str r0, [sp, #8]
-@ We know we are not on armv4t, so pop pc is safe.
-pop {r0, r1, pc}
-.align  2
-1:
-.word   __aeabi_ldiv0 - 1b
-7:
-.endm
-
-#ifdef L_aeabi_ldivmod
-
-FUNC_START aeabi_ldivmod
-test_div_by_zero signed
-
-push{r0, r1}
-mov r0, sp
-push{r0, lr}
-ldr r0, [sp, #8]
-bl  SYM(__gnu_ldivmod_helper)
-ldr r3, [sp, #4]
-mov lr, r3
-add sp, sp, #8
-pop {r2, r3}
+#ifndef __GNUC__
+
+// long long __aeabi_ldiv0(long long)
+// Helper function for division by 0.
+WEAK_START_SECTION aeabi_ldiv0 .text.sorted.libgcc.ldiv.ldiv0
+CFI_START_FUNCTION
+
+  #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+svc #(SVC_DIVISION_BY_ZERO)
+  #endif
+
 R

[PATCH v5 22/33] Import integer multiplication from the CM0 library

2021-01-15 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-07 Daniel Engel 

* config/arm/eabi/lmul.S: New file for __muldi3(), __mulsidi3(), and
 __umulsidi3().
* config/arm/lib1funcs.S: #eabi/lmul.S (v6m only).
* config/arm/t-elf: Add the new objects to LIB1ASMFUNCS.
---
 libgcc/config/arm/eabi/lmul.S | 218 ++
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/t-elf   |  13 +-
 3 files changed, 230 insertions(+), 2 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/lmul.S

diff --git a/libgcc/config/arm/eabi/lmul.S b/libgcc/config/arm/eabi/lmul.S
new file mode 100644
index 000..9fec4364a26
--- /dev/null
+++ b/libgcc/config/arm/eabi/lmul.S
@@ -0,0 +1,218 @@
+/* lmul.S: Thumb-1 optimized 64-bit integer multiplication
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_muldi3
+
+// long long __aeabi_lmul(long long, long long)
+// Returns the least significant 64 bits of a 64 bit multiplication.
+// Expects the two multiplicands in $r1:$r0 and $r3:$r2.
+// Returns the product in $r1:$r0 (does not distinguish signed types).
+// Uses $r4 and $r5 as scratch space.
+// Same parent section as __umulsidi3() to keep tail call branch within range.
+FUNC_START_SECTION muldi3 .text.sorted.libgcc.lmul.muldi3
+
+#ifndef __symbian__
+  FUNC_ALIAS aeabi_lmul muldi3
+#endif
+
+CFI_START_FUNCTION
+
+// $r1:$r0 = 0x
+// $r3:$r2 = 0x
+
+// The following operations that only affect the upper 64 bits
+//  can be safely discarded:
+//    * 
+//    * 
+//    * 
+//    * 
+//    * 
+//    * 
+
+// MAYBE: Test for multiply by ZERO on implementations with a 32-cycle
+//  'muls' instruction, and skip over the operation in that case.
+
+// (0x * 0x), free $r1
+mulsxxh,yyl
+
+// (0x * 0x), free $r3
+mulsyyh,xxl
+addsyyh,xxh
+
+// Put the parameters in the correct form for umulsidi3().
+movsxxh,yyl
+b   LLSYM(__mul_overflow)
+
+CFI_END_FUNCTION
+FUNC_END muldi3
+
+#ifndef __symbian__
+  FUNC_END aeabi_lmul
+#endif
+
+#endif /* L_muldi3 */
+
+
+// The following implementation of __umulsidi3() integrates with __muldi3()
+//  above to allow the fast tail call while still preserving the extra
+//  hi-shifted bits of the result.  However, these extra bits add a few
+//  instructions not otherwise required when using only __umulsidi3().
+// Therefore, this block configures __umulsidi3() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version adds the hi bits of __muldi3().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols in programs that multiply long doubles.
+// This means '_umulsidi3' should appear before '_muldi3' in LIB1ASMFUNCS.
+#if defined(L_muldi3) || defined(L_umulsidi3)
+
+#ifdef L_umulsidi3
+// unsigned long long __umulsidi3(unsigned int, unsigned int)
+// Returns all 64 bits of a 32 bit multiplication.
+// Expects the two multiplicands in $r0 and $r1.
+// Returns the product in $r1:$r0.
+// Uses $r3, $r4 and $ip as scratch space.
+WEAK_START_SECTION umulsidi3 .text.sorted.libgcc.lmul.umulsidi3
+CFI_START_FUNCTION
+
+#else /* L_muldi3 */
+FUNC_ENTRY umulsidi3
+CFI_START_FUNCTION
+
+// 32x32 multiply with 64 bit result.
+// Expand the multiply into 4 parts, since muls only returns 32 bits.
+// (a16h * b16h / 2^32)
+//   + (a16h * b16l / 2^48) + (a16l * b16h / 2^48)
+//   + (a16l * b16l / 2^64)
+
+// MAYBE: Test for multiply by 0 on implementations with a 32-cycle
+//  'muls' instruc

[PATCH v5 23/33] Refactor Thumb-1 float comparison into a new file

2021-01-15 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bpabi-v6m.S (__aeabi_cfcmpeq, __aeabi_cfcmple,
__aeabi_cfrcmple, __aeabi_fcmpeq, __aeabi_fcmple, aeabi_fcmple,
__aeabi_fcmpgt, aeabi_fcmpge): Moved to ...
* config/arm/eabi/fcmp.S: New file.
* config/arm/lib1funcs.S: #include eabi/fcmp.S (v6m only).
---
 libgcc/config/arm/bpabi-v6m.S | 63 -
 libgcc/config/arm/eabi/fcmp.S | 89 +++
 libgcc/config/arm/lib1funcs.S |  1 +
 3 files changed, 90 insertions(+), 63 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fcmp.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index b3dc3bf8f4d..7c874f06218 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -49,69 +49,6 @@ FUNC_START aeabi_frsub
 
 #endif /* L_arm_addsubsf3 */
 
-#ifdef L_arm_cmpsf2
-
-FUNC_START aeabi_cfrcmple
-
-   mov ip, r0
-   movsr0, r1
-   mov r1, ip
-   b   6f
-
-FUNC_START aeabi_cfcmpeq
-FUNC_ALIAS aeabi_cfcmple aeabi_cfcmpeq
-
-   @ The status-returning routines are required to preserve all
-   @ registers except ip, lr, and cpsr.
-6: push{r0, r1, r2, r3, r4, lr}
-   bl  __lesf2
-   @ Set the Z flag correctly, and the C flag unconditionally.
-   cmp r0, #0
-   @ Clear the C flag if the return value was -1, indicating
-   @ that the first operand was smaller than the second.
-   bmi 1f
-   movsr1, #0
-   cmn r0, r1
-1:
-   pop {r0, r1, r2, r3, r4, pc}
-
-   FUNC_END aeabi_cfcmple
-   FUNC_END aeabi_cfcmpeq
-   FUNC_END aeabi_cfrcmple
-
-FUNC_START aeabi_fcmpeq
-
-   push{r4, lr}
-   bl  __eqsf2
-   negsr0, r0
-   addsr0, r0, #1
-   pop {r4, pc}
-
-   FUNC_END aeabi_fcmpeq
-
-.macro COMPARISON cond, helper, mode=sf2
-FUNC_START aeabi_fcmp\cond
-
-   push{r4, lr}
-   bl  __\helper\mode
-   cmp r0, #0
-   b\cond  1f
-   movsr0, #0
-   pop {r4, pc}
-1:
-   movsr0, #1
-   pop {r4, pc}
-
-   FUNC_END aeabi_fcmp\cond
-.endm
-
-COMPARISON lt, le
-COMPARISON le, le
-COMPARISON gt, ge
-COMPARISON ge, ge
-
-#endif /* L_arm_cmpsf2 */
-
 #ifdef L_arm_addsubdf3
 
 FUNC_START aeabi_drsub
diff --git a/libgcc/config/arm/eabi/fcmp.S b/libgcc/config/arm/eabi/fcmp.S
new file mode 100644
index 000..96d627f1fea
--- /dev/null
+++ b/libgcc/config/arm/eabi/fcmp.S
@@ -0,0 +1,89 @@
+/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
+   ARMv6-M and ARMv8-M Baseline like ISA variants.
+
+   Copyright (C) 2006-2020 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_cmpsf2
+
+FUNC_START aeabi_cfrcmple
+
+   mov ip, r0
+   movsr0, r1
+   mov r1, ip
+   b   6f
+
+FUNC_START aeabi_cfcmpeq
+FUNC_ALIAS aeabi_cfcmple aeabi_cfcmpeq
+
+   @ The status-returning routines are required to preserve all
+   @ registers except ip, lr, and cpsr.
+6: push{r0, r1, r2, r3, r4, lr}
+   bl  __lesf2
+   @ Set the Z flag correctly, and the C flag unconditionally.
+   cmp r0, #0
+   @ Clear the C flag if the return value was -1, indicating
+   @ that the first operand was smaller than the second.
+   bmi 1f
+   movsr1, #0
+   cmn r0, r1
+1:
+   pop {r0, r1, r2, r3, r4, pc}
+
+   FUNC_END aeabi_cfcmple
+   FUNC_END aeabi_cfcmpeq
+   FUNC_END aeabi_cfrcmple
+
+FUNC_START aeabi_fcmpeq
+
+   push{r4, lr}
+   bl  __eqsf2
+   negsr0, r0
+   addsr0, r0, #1
+   pop {r4, pc}
+
+   FUNC_END aeabi_fcmpeq
+
+.macro COMPARISON cond, helper, mode=sf2
+FUNC_START aeabi_fcmp\cond
+
+   push{r4, lr}
+   bl  __\helper\mode
+   cmp r0, #0
+   b\cond  1f
+   movsr0, #0
+   pop {r4, pc}
+1:
+   movs

[PATCH v5 24/33] Import float comparison from the CM0 library

2021-01-15 Thread Daniel Engel

These functions are significantly smaller and faster than the wrapper
functions and soft-float implementation they replace.  Using the first
comparison operator (e.g. '<=') in any program costs about 70 bytes
initially, but every additional operator incrementally adds just 4 bytes.

NOTE: It seems that the __aeabi_cfcmp*() routines formerly in bpabi-v6m.S
were not well tested, as they returned wrong results for the 'C' flag.
The replacement functions are fully tested.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/eabi/fcmp.S (__cmpsf2, __eqsf2, __gesf2,
__aeabi_fcmpne, __aeabi_fcmpun): Added new functions.
(__aeabi_fcmpeq, __aeabi_fcmpne, __aeabi_fcmplt, __aeabi_fcmple,
 __aeabi_fcmpge, __aeabi_fcmpgt, __aeabi_cfcmple, __aeabi_cfcmpeq,
 __aeabi_cfrcmple): Replaced with branches to __internal_cmpsf2().
* config/arm/eabi/fplib.h: New file with fcmp-specific constants
and general build configuration macros.
* config/arm/lib1funcs.S: #include eabi/fplib.h (v6m only).
* config/arm/t-elf (LIB1ASMFUNCS): Added _internal_cmpsf2,
_arm_cfcmpeq, _arm_cfcmple, _arm_cfrcmple, _arm_fcmpeq,
_arm_fcmpge, _arm_fcmpgt, _arm_fcmple, _arm_fcmplt, _arm_fcmpne,
_arm_eqsf2, and _arm_gesf2.
---
 libgcc/config/arm/eabi/fcmp.S  | 643 +
 libgcc/config/arm/eabi/fplib.h |  83 +
 libgcc/config/arm/lib1funcs.S  |   1 +
 libgcc/config/arm/t-elf|  18 +
 4 files changed, 681 insertions(+), 64 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fplib.h

diff --git a/libgcc/config/arm/eabi/fcmp.S b/libgcc/config/arm/eabi/fcmp.S
index 96d627f1fea..cada33f4d35 100644
--- a/libgcc/config/arm/eabi/fcmp.S
+++ b/libgcc/config/arm/eabi/fcmp.S
@@ -1,8 +1,7 @@
-/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
-   ARMv6-M and ARMv8-M Baseline like ISA variants.
+/* fcmp.S: Thumb-1 optimized 32-bit float comparison
 
-   Copyright (C) 2006-2020 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
 
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
@@ -24,66 +23,582 @@
<http://www.gnu.org/licenses/>.  */
 
 
+// The various compare functions in this file all expect to tail call 
__cmpsf2()
+//  with flags set for a particular comparison mode.  The __internal_cmpsf2()
+//  symbol  itself is unambiguous, but there is a remote risk that the linker 
+//  will prefer some other symbol in place of __cmpsf2().  Importing an archive
+//  file that also exports __cmpsf2() will throw an error in this case.
+// As a workaround, this block configures __aeabi_f2lz() for compilation twice.
+// The first version configures __internal_cmpsf2() as a WEAK standalone 
symbol,
+//  and the second exports __cmpsf2() and __internal_cmpsf2() normally.
+// A small bonus: programs not using __cmpsf2() itself will be slightly 
smaller.
+// 'L_internal_cmpsf2' should appear before 'L_arm_cmpsf2' in LIB1ASMFUNCS.
+#if defined(L_arm_cmpsf2) || defined(L_internal_cmpsf2)
+
+#define CMPSF2_SECTION .text.sorted.libgcc.fcmp.cmpsf2
+
+// int __cmpsf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+//  * +1 if ($r0 > $r1), or either argument is NAN
+//  *  0 if ($r0 == $r1)
+//  * -1 if ($r0 < $r1)
+// Uses $r2, $r3, and $ip as scratch space.
+#ifdef L_arm_cmpsf2
+FUNC_START_SECTION cmpsf2 CMPSF2_SECTION
+FUNC_ALIAS lesf2 cmpsf2
+FUNC_ALIAS ltsf2 cmpsf2
+CFI_START_FUNCTION
+
+// Assumption: The 'libgcc' functions should raise exceptions.
+movsr2, #(FCMP_UN_POSITIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+
+// int,int __internal_cmpsf2(float, float, int)
+// Internal function expects a set of control flags in $r2.
+// If ordered, returns a comparison type { 0, 1, 2 } in $r3
+FUNC_ENTRY internal_cmpsf2
+
+#else /* L_internal_cmpsf2 */
+WEAK_START_SECTION internal_cmpsf2 CMPSF2_SECTION
+CFI_START_FUNCTION
+
+#endif 
+
+// When operand signs are considered, the comparison result falls
+//  within one of the following quadrants:
+//
+// $r0  $r1  $r0-$r1* flags  result
+//  ++  >  C=0 GT
+//  ++  =  Z=1 EQ
+//  ++  <  C=1 LT
+//  +-  >  C=1 GT
+//  +-  =  C=1 GT
+//  +-  <  C=1 GT
+//  -+  >  C=0 LT
+//  -+  =  C=0 LT
+//  -+  <  C=0 LT
+//  --  >  C=0

[PATCH v5 25/33] Refactor Thumb-1 float subtraction into a new file

2021-01-15 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bpabi-v6m.S (__aeabi_frsub): Moved to ...
* config/arm/eabi/fadd.S: New file.
* config/arm/lib1funcs.S: #include eabi/fadd.S (v6m only).
---
 libgcc/config/arm/bpabi-v6m.S | 16 ---
 libgcc/config/arm/eabi/fadd.S | 38 +++
 libgcc/config/arm/lib1funcs.S |  1 +
 3 files changed, 39 insertions(+), 16 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fadd.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index 7c874f06218..c76c3b0568b 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -33,22 +33,6 @@
.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-
-#ifdef L_arm_addsubsf3
-
-FUNC_START aeabi_frsub
-
-  push {r4, lr}
-  movs r4, #1
-  lsls r4, #31
-  eors r0, r0, r4
-  bl   __aeabi_fadd
-  pop  {r4, pc}
-
-  FUNC_END aeabi_frsub
-
-#endif /* L_arm_addsubsf3 */
-
 #ifdef L_arm_addsubdf3
 
 FUNC_START aeabi_drsub
diff --git a/libgcc/config/arm/eabi/fadd.S b/libgcc/config/arm/eabi/fadd.S
new file mode 100644
index 000..fffbd91d1bc
--- /dev/null
+++ b/libgcc/config/arm/eabi/fadd.S
@@ -0,0 +1,38 @@
+/* Copyright (C) 2006-2021 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_addsubsf3
+
+FUNC_START aeabi_frsub
+
+  push {r4, lr}
+  movs r4, #1
+  lsls r4, #31
+  eors r0, r0, r4
+  bl   __aeabi_fadd
+  pop  {r4, pc}
+
+  FUNC_END aeabi_frsub
+
+#endif /* L_arm_addsubsf3 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 236b7a7763f..31132633f32 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2012,6 +2012,7 @@ LSYM(Lchange_\register):
 #include "bpabi-v6m.S"
 #include "eabi/fplib.h"
 #include "eabi/fcmp.S"
+#include "eabi/fadd.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #include "eabi/lcmp.S"
 #endif /* !__symbian__ */
-- 
2.25.1

[PATCH v5 26/33] Import float addition and subtraction from the CM0 library

2021-01-15 Thread Daniel Engel

Since this is the first import of single-precision functions, some common
parsing and formatting routines are also included.  These common rotines
will be referenced by other functions in subsequent commits.
However, even if the size penalty is accounted entirely to __addsf3(),
the total compiled size is still less than half the size of soft-float.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/eabi/fadd.S (__addsf3, __subsf3): Added new functions.
* config/arm/eabi/fneg.S (__negsf2): Added new file.
* config/arm/eabi/futil.S (__fp_normalize2, __fp_lalign2, __fp_assemble,
__fp_overflow, __fp_zero, __fp_check_nan): Added new file with shared
helper functions.
* config/arm/lib1funcs.S: #include eabi/fneg.S and eabi/futil.S (v6m 
only).
* config/arm/t-elf (LIB1ASMFUNCS): Added _arm_addsf3, _arm_frsubsf3,
_fp_exceptionf, _fp_checknanf, _fp_assemblef, and _fp_normalizef.
---
 libgcc/config/arm/eabi/fadd.S  | 306 +++-
 libgcc/config/arm/eabi/fneg.S  |  76 ++
 libgcc/config/arm/eabi/fplib.h |   3 -
 libgcc/config/arm/eabi/futil.S | 418 +
 libgcc/config/arm/lib1funcs.S  |   2 +
 libgcc/config/arm/t-elf|   6 +
 6 files changed, 798 insertions(+), 13 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fneg.S
 create mode 100644 libgcc/config/arm/eabi/futil.S

diff --git a/libgcc/config/arm/eabi/fadd.S b/libgcc/config/arm/eabi/fadd.S
index fffbd91d1bc..77b81d62b3b 100644
--- a/libgcc/config/arm/eabi/fadd.S
+++ b/libgcc/config/arm/eabi/fadd.S
@@ -1,5 +1,7 @@
-/* Copyright (C) 2006-2021 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+/* fadd.S: Thumb-1 optimized 32-bit float addition and subtraction
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
 
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
@@ -21,18 +23,302 @@
<http://www.gnu.org/licenses/>.  */
 
 
+#ifdef L_arm_frsubsf3
+
+// float __aeabi_frsub(float, float)
+// Returns the floating point difference of $r1 - $r0 in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_frsub .text.sorted.libgcc.fpcore.b.frsub
+CFI_START_FUNCTION
+
+  #if defined(STRICT_NANS) && STRICT_NANS
+// Check if $r0 is NAN before modifying.
+lslsr2, r0, #1
+movsr3, #255
+lslsr3, #24
+
+// Let fadd() find the NAN in the normal course of operation,
+//  moving it to $r0 and checking the quiet/signaling bit.
+cmp r2, r3
+bhi SYM(__aeabi_fadd)
+  #endif
+
+// Flip sign and run through fadd().
+movsr2, #1
+lslsr2, #31
+addsr0, r2
+b   SYM(__aeabi_fadd)
+
+CFI_END_FUNCTION
+FUNC_END aeabi_frsub
+
+#endif /* L_arm_frsubsf3 */
+
+
 #ifdef L_arm_addsubsf3
 
-FUNC_START aeabi_frsub
+// float __aeabi_fsub(float, float)
+// Returns the floating point difference of $r0 - $r1 in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_fsub .text.sorted.libgcc.fpcore.c.faddsub
+FUNC_ALIAS subsf3 aeabi_fsub
+CFI_START_FUNCTION
 
-  push {r4, lr}
-  movs r4, #1
-  lsls r4, #31
-  eors r0, r0, r4
-  bl   __aeabi_fadd
-  pop  {r4, pc}
+  #if defined(STRICT_NANS) && STRICT_NANS
+// Check if $r1 is NAN before modifying.
+lslsr2, r1, #1
+movsr3, #255
+lslsr3, #24
 
-  FUNC_END aeabi_frsub
+// Let fadd() find the NAN in the normal course of operation,
+//  moving it to $r0 and checking the quiet/signaling bit.
+cmp r2, r3
+bhi SYM(__aeabi_fadd)
+  #endif
+
+// Flip sign and fall into fadd().
+movsr2, #1
+lslsr2, #31
+addsr1, r2
 
 #endif /* L_arm_addsubsf3 */
 
+
+// The execution of __subsf3() flows directly into __addsf3(), such that
+//  instructions must appear consecutively in the same memory section.
+//  However, this construction inhibits the ability to discard __subsf3()
+//  when only using __addsf3().
+// Therefore, this block configures __addsf3() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __subsf3().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_arm_addsf3' should appear before '_arm_addsubsf3' in LIB1ASMFUNCS.
+#if defined(L_arm_addsf3) || defined(L_arm_addsubsf3)
+
+#ifdef L_arm_addsf3
+// float __aeabi_fadd(fl

[PATCH v5 27/33] Import float multiplication from the CM0 library

2021-01-15 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/eabi/fmul.S (__mulsf3): New file.
* config/arm/lib1funcs.S: #include eabi/fmul.S (v6m only).
* config/arm/t-elf (LIB1ASMFUNCS): Moved _mulsf3 to global scope
(this object was previously blocked on v6m builds).
---
 libgcc/config/arm/eabi/fmul.S | 215 ++
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/t-elf   |   3 +-
 3 files changed, 218 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/arm/eabi/fmul.S

diff --git a/libgcc/config/arm/eabi/fmul.S b/libgcc/config/arm/eabi/fmul.S
new file mode 100644
index 000..767de988f0b
--- /dev/null
+++ b/libgcc/config/arm/eabi/fmul.S
@@ -0,0 +1,215 @@
+/* fmul.S: Thumb-1 optimized 32-bit float multiplication
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_mulsf3
+
+// float __aeabi_fmul(float, float)
+// Returns $r0 after multiplication by $r1.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_fmul .text.sorted.libgcc.fpcore.m.fmul
+FUNC_ALIAS mulsf3 aeabi_fmul
+CFI_START_FUNCTION
+
+// Standard registers, compatible with exception handling.
+push{ rT, lr }
+.cfi_remember_state
+.cfi_remember_state
+.cfi_adjust_cfa_offset 8
+.cfi_rel_offset rT, 0
+.cfi_rel_offset lr, 4
+
+// Save the sign of the result.
+movsrT, r1
+eorsrT, r0
+lsrsrT, #31
+lslsrT, #31
+mov ip, rT
+
+// Set up INF for comparison.
+movsrT, #255
+lslsrT, #24
+
+// Check for multiplication by zero.
+lslsr2, r0, #1
+beq LLSYM(__fmul_zero1)
+
+lslsr3, r1, #1
+beq LLSYM(__fmul_zero2)
+
+// Check for INF/NAN.
+cmp r3, rT
+bhs LLSYM(__fmul_special2)
+
+cmp r2, rT
+bhs LLSYM(__fmul_special1)
+
+// Because neither operand is INF/NAN, the result will be finite.
+// It is now safe to modify the original operand registers.
+lslsr0, #9
+
+// Isolate the first exponent.  When normal, add back the implicit '1'.
+// The result is always aligned with the MSB in bit [31].
+// Subnormal mantissas remain effectively multiplied by 2x relative to
+//  normals, but this works because the weight of a subnormal is -126.
+lsrsr2, #24
+beq LLSYM(__fmul_normalize2)
+addsr0, #1
+rorsr0, r0
+
+LLSYM(__fmul_normalize2):
+// IMPORTANT: exp10i() jumps in here!
+// Repeat for the mantissa of the second operand.
+// Short-circuit when the mantissa is 1.0, as the
+//  first mantissa is already prepared in $r0
+lslsr1, #9
+
+// When normal, add back the implicit '1'.
+lsrsr3, #24
+beq LLSYM(__fmul_go)
+addsr1, #1
+rorsr1, r1
+
+LLSYM(__fmul_go):
+// Calculate the final exponent, relative to bit [30].
+addsrT, r2, r3
+subsrT, #127
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+// Short-circuit on multiplication by powers of 2.
+lslsr3, r0, #1
+beq LLSYM(__fmul_simple1)
+
+lslsr3, r1, #1
+beq LLSYM(__fmul_simple2)
+  #endif
+
+// Save $ip across the call.
+// (Alternatively, could push/pop a separate register,
+//  but the four instructions here are equivally fast)
+//  without imposing on the stack.
+add rT, ip
+
+// 32x32 unsigned multiplication, 64 bit result.
+bl

[PATCH v5 28/33] Import float division from the CM0 library

2021-01-15 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-08 Daniel Engel 

* config/arm/eabi/fdiv.S (__divsf3, __fp_divloopf): New file.
* config/arm/lib1funcs.S: #include eabi/fdiv.S (v6m only).
* config/arm/t-elf (LIB1ASMFUNCS): Added _divsf3 and _fp_divloopf.
---
 libgcc/config/arm/eabi/fdiv.S | 261 ++
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/t-elf   |   2 +
 3 files changed, 264 insertions(+)
 create mode 100644 libgcc/config/arm/eabi/fdiv.S

diff --git a/libgcc/config/arm/eabi/fdiv.S b/libgcc/config/arm/eabi/fdiv.S
new file mode 100644
index 000..118f4e94676
--- /dev/null
+++ b/libgcc/config/arm/eabi/fdiv.S
@@ -0,0 +1,261 @@
+/* fdiv.S: Cortex M0 optimized 32-bit float division
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_divsf3
+
+// float __aeabi_fdiv(float, float)
+// Returns $r0 after division by $r1.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_fdiv .text.sorted.libgcc.fpcore.n.fdiv
+FUNC_ALIAS divsf3 aeabi_fdiv
+CFI_START_FUNCTION
+
+// Standard registers, compatible with exception handling.
+push{ rT, lr }
+.cfi_remember_state
+.cfi_remember_state
+.cfi_adjust_cfa_offset 8
+.cfi_rel_offset rT, 0
+.cfi_rel_offset lr, 4
+
+// Save for the sign of the result.
+movsr3, r1
+eorsr3, r0
+lsrsrT, r3, #31
+lslsrT, #31
+mov ip, rT
+
+// Set up INF for comparison.
+movsrT, #255
+lslsrT, #24
+
+// Check for divide by 0.  Automatically catches 0/0.
+lslsr2, r1, #1
+beq LLSYM(__fdiv_by_zero)
+
+// Check for INF/INF, or a number divided by itself.
+lslsr3, #1
+beq LLSYM(__fdiv_equal)
+
+// Check the numerator for INF/NAN.
+eorsr3, r2
+cmp r3, rT
+bhs LLSYM(__fdiv_special1)
+
+// Check the denominator for INF/NAN.
+cmp r2, rT
+bhs LLSYM(__fdiv_special2)
+
+// Check the numerator for zero.
+cmp r3, #0
+beq SYM(__fp_zero)
+
+// No action if the numerator is subnormal.
+//  The mantissa will normalize naturally in the division loop.
+lslsr0, #9
+lsrsr1, r3, #24
+beq LLSYM(__fdiv_denominator)
+
+// Restore the numerator's implicit '1'.
+addsr0, #1
+rorsr0, r0
+
+LLSYM(__fdiv_denominator):
+// The denominator must be normalized and left aligned.
+bl  SYM(__fp_normalize2)
+
+// 25 bits of precision will be sufficient.
+movsrT, #64
+
+// Run division.
+bl  SYM(__fp_divloopf)
+b   SYM(__fp_assemble)
+
+LLSYM(__fdiv_equal):
+  #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+movsr3, #(DIVISION_INF_BY_INF)
+  #endif
+
+// The absolute value of both operands are equal, but not 0.
+// If both operands are INF, create a new NAN.
+cmp r2, rT
+beq SYM(__fp_exception)
+
+  #if defined(TRAP_NANS) && TRAP_NANS
+// If both operands are NAN, return the NAN in $r0.
+bhi SYM(__fp_check_nan)
+  #else
+bhi LLSYM(__fdiv_return)
+  #endif
+
+// Return 1.0f, with appropriate sign.
+movsr0, #127
+lslsr0, #23
+add r0, ip
+
+LLSYM(__fdiv_return):
+pop { rT, pc }
+.cfi_restore_state
+
+LLSYM(__fdiv_special2):
+// The denominator is either INF or NAN, numerator is neither.
+// Also, the denominator is not equal to 0.
+// If th

[PATCH v5 29/33] Import integer-to-float conversion from the CM0 library

2021-01-15 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bpabi-lib.h (__floatdisf, __floatundisf):
Remove obsolete RENAME_LIBRARY directives.
* config/arm/eabi/ffloat.S (__aeabi_i2f, __aeabi_l2f, __aeabi_ui2f,
__aeabi_ul2f): New file.
* config/arm/lib1funcs.S: #include eabi/ffloat.S (v6m only).
* config/arm/t-elf (LIB1ASMFUNCS): Added _arm_floatunsisf,
_arm_floatsisf, and _internal_floatundisf.
Moved _arm_floatundisf to the weak function group
---
 libgcc/config/arm/bpabi-lib.h   |   6 -
 libgcc/config/arm/eabi/ffloat.S | 247 
 libgcc/config/arm/lib1funcs.S   |   1 +
 libgcc/config/arm/t-elf |   5 +-
 4 files changed, 252 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/ffloat.S

diff --git a/libgcc/config/arm/bpabi-lib.h b/libgcc/config/arm/bpabi-lib.h
index 3cb90b4b345..1e651ead4ac 100644
--- a/libgcc/config/arm/bpabi-lib.h
+++ b/libgcc/config/arm/bpabi-lib.h
@@ -56,9 +56,6 @@
 #ifdef L_floatdidf
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatdidf, l2d)
 #endif
-#ifdef L_floatdisf
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatdisf, l2f)
-#endif
 
 /* These renames are needed on ARMv6M.  Other targets get them from
assembly routines.  */
@@ -71,9 +68,6 @@
 #ifdef L_floatundidf
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatundidf, ul2d)
 #endif
-#ifdef L_floatundisf
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatundisf, ul2f)
-#endif
 
 /* For ARM bpabi, we only want to use a "__gnu_" prefix for the fixed-point
helper functions - not everything in libgcc - in the interests of
diff --git a/libgcc/config/arm/eabi/ffloat.S b/libgcc/config/arm/eabi/ffloat.S
new file mode 100644
index 000..9690ab85081
--- /dev/null
+++ b/libgcc/config/arm/eabi/ffloat.S
@@ -0,0 +1,247 @@
+/* ffixed.S: Thumb-1 optimized integer-to-float conversion
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_floatsisf
+
+// float __aeabi_i2f(int)
+// Converts a signed integer in $r0 to float.
+
+// On little-endian cores (including all Cortex-M), __floatsisf() can be
+//  implemented as below in 5 instructions.  However, it can also be
+//  implemented by prefixing a single instruction to __floatdisf().
+// A memory savings of 4 instructions at a cost of only 2 execution cycles
+//  seems reasonable enough.  Plus, the trade-off only happens in programs
+//  that require both __floatsisf() and __floatdisf().  Programs only using
+//  __floatsisf() always get the smallest version.
+// When the combined version will be provided, this standalone version
+//  must be declared WEAK, so that the combined version can supersede it.
+// '_arm_floatsisf' should appear before '_arm_floatdisf' in LIB1ASMFUNCS.
+// Same parent section as __ul2f() to keep tail call branch within range.
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+WEAK_START_SECTION aeabi_i2f .text.sorted.libgcc.fpcore.p.floatsisf
+WEAK_ALIAS floatsisf aeabi_i2f
+CFI_START_FUNCTION
+
+#else /* !__OPTIMIZE_SIZE__ */
+FUNC_START_SECTION aeabi_i2f .text.sorted.libgcc.fpcore.p.floatsisf
+FUNC_ALIAS floatsisf aeabi_i2f
+CFI_START_FUNCTION
+
+#endif /* !__OPTIMIZE_SIZE__ */
+
+// Save the sign.
+asrsr3, r0, #31
+
+// Absolute value of the input.
+eorsr0, r3
+subsr0, r3
+
+// Sign extension to long long unsigned.
+eorsr1, r1
+b   SYM(__internal_floatundisf_noswap)
+
+CFI_END_FUNCTION
+FUNC_END floatsisf
+FUNC_END aeabi_i2f
+
+#endif /* L_arm_floatsisf */
+
+
+#ifdef L_arm_floatdisf
+
+// float __aeabi_l2f(long long)
+// Converts a signed 64-bit integer in $r1:$r0 to a float in $r0.
+// See build comments for __floatsisf() above.
+// Same parent section as __ul2f() to keep tail call branch within range.
+#if defined(__OPTIMIZE_SIZE

[PATCH v5 31/33] Import float<->double conversion from the CM0 library

2021-01-15 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/eabi/fcast.S (__aeabi_d2f, __aeabi_f2d): New file.
* config/arm/lib1funcs.S: #include eabi/fcast.S (v6m only).
* config/arm/t-elf (LIB1ASMFUNCS): Added _arm_d2f and _arm_f2d.
---
 libgcc/config/arm/eabi/fcast.S | 256 +
 libgcc/config/arm/lib1funcs.S  |   1 +
 libgcc/config/arm/t-elf|   2 +
 3 files changed, 259 insertions(+)
 create mode 100644 libgcc/config/arm/eabi/fcast.S

diff --git a/libgcc/config/arm/eabi/fcast.S b/libgcc/config/arm/eabi/fcast.S
new file mode 100644
index 000..b1184ee1d53
--- /dev/null
+++ b/libgcc/config/arm/eabi/fcast.S
@@ -0,0 +1,256 @@
+/* fcast.S: Thumb-1 optimized 32- and 64-bit float conversions
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_f2d
+
+// double __aeabi_f2d(float)
+// Converts a single-precision float in $r0 to double-precision in $r1:$r0.
+// Rounding, overflow, and underflow are impossible.
+// INF and ZERO are returned unmodified.
+FUNC_START_SECTION aeabi_f2d .text.sorted.libgcc.fpcore.v.f2d
+FUNC_ALIAS extendsfdf2 aeabi_f2d
+CFI_START_FUNCTION
+
+// Save the sign.
+lsrsr1, r0, #31
+lslsr1, #31
+
+// Set up registers for __fp_normalize2().
+push{ rT, lr }
+.cfi_remember_state
+.cfi_adjust_cfa_offset 8
+.cfi_rel_offset rT, 0
+.cfi_rel_offset lr, 4
+
+// Test for zero.
+lslsr0, #1
+beq LLSYM(__f2d_return)
+
+// Split the exponent and mantissa into separate registers.
+// This is the most efficient way to convert subnormals in the
+//  half-precision form into normals in single-precision.
+// This does add a leading implicit '1' to INF and NAN,
+//  but that will be absorbed when the value is re-assembled.
+movsr2, r0
+bl  SYM(__fp_normalize2) __PLT__
+
+// Set up the exponent bias.  For INF/NAN values, the bias
+//  is 1791 (2047 - 255 - 1), where the last '1' accounts
+//  for the implicit '1' in the mantissa.
+movsr0, #3
+lslsr0, #9
+addsr0, #255
+
+// Test for INF/NAN, promote exponent if necessary
+cmp r2, #255
+beq LLSYM(__f2d_indefinite)
+
+// For normal values, the exponent bias is 895 (1023 - 127 - 1),
+//  which is half of the prepared INF/NAN bias.
+lsrsr0, #1
+
+LLSYM(__f2d_indefinite):
+// Assemble exponent with bias correction.
+addsr2, r0
+lslsr2, #20
+addsr1, r2
+
+// Assemble the high word of the mantissa.
+lsrsr0, r3, #11
+add r1, r0
+
+// Remainder of the mantissa in the low word of the result.
+lslsr0, r3, #21
+
+LLSYM(__f2d_return):
+pop { rT, pc }
+.cfi_restore_state
+
+CFI_END_FUNCTION
+FUNC_END extendsfdf2
+FUNC_END aeabi_f2d
+
+#endif /* L_arm_f2d */
+
+
+#if defined(L_arm_d2f) || defined(L_arm_truncdfsf2)
+
+// HACK: Build two separate implementations:
+//  * __aeabi_d2f() rounds to nearest per traditional IEEE-753 rules.
+//  * __truncdfsf2() rounds towards zero per GCC specification.
+// Presumably, a program will consistently use one ABI or the other,
+//  which means that code size will not be duplicated in practice.
+// Merging two versions with dynamic rounding would be rather hard.
+#ifdef L_arm_truncdfsf2
+  #define D2F_NAME truncdfsf2
+  #define D2F_SECTION .text.sorted.libgcc.fpcore.x.truncdfsf2
+#else
+  #define D2F_NAME aeabi_d2f
+  #define D2F_SECTION .text.sorted.libgcc.fpcore.w.d2f
+#endif
+
+// float __aeabi_d2f(double)
+// Converts a double-precision float in $r1:$r0 to single-precision in $r0.
+/

[PATCH v5 30/33] Import float-to-integer conversion from the CM0 library

2021-01-15 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bpabi-lib.h (muldi3): Removed duplicate.
(fixunssfsi) Removed obsolete RENAME_LIBRARY directive.
* config/arm/eabi/ffixed.S (__aeabi_f2iz, __aeabi_f2uiz,
__aeabi_f2lz, __aeabi_f2ulz): New file.
* config/arm/lib1funcs.S: #include eabi/ffixed.S (v6m only).
* config/arm/t-elf (LIB1ASMFUNCS): Added _internal_fixsfdi,
_internal_fixsfsi, _arm_fixsfdi, and _arm_fixunssfdi.
---
 libgcc/config/arm/bpabi-lib.h   |   6 -
 libgcc/config/arm/eabi/ffixed.S | 414 
 libgcc/config/arm/lib1funcs.S   |   1 +
 libgcc/config/arm/t-elf |   4 +
 4 files changed, 419 insertions(+), 6 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/ffixed.S

diff --git a/libgcc/config/arm/bpabi-lib.h b/libgcc/config/arm/bpabi-lib.h
index 1e651ead4ac..a1c631640bb 100644
--- a/libgcc/config/arm/bpabi-lib.h
+++ b/libgcc/config/arm/bpabi-lib.h
@@ -32,9 +32,6 @@
 #ifdef L_muldi3
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (muldi3, lmul)
 #endif
-#ifdef L_muldi3
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (muldi3, lmul)
-#endif
 #ifdef L_fixdfdi
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (fixdfdi, d2lz) \
   extern DWtype __fixdfdi (DFtype) __attribute__((pcs("aapcs"))); \
@@ -62,9 +59,6 @@
 #ifdef L_fixunsdfsi
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (fixunsdfsi, d2uiz)
 #endif
-#ifdef L_fixunssfsi
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (fixunssfsi, f2uiz)
-#endif
 #ifdef L_floatundidf
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatundidf, ul2d)
 #endif
diff --git a/libgcc/config/arm/eabi/ffixed.S b/libgcc/config/arm/eabi/ffixed.S
new file mode 100644
index 000..8ced3a701ff
--- /dev/null
+++ b/libgcc/config/arm/eabi/ffixed.S
@@ -0,0 +1,414 @@
+/* ffixed.S: Thumb-1 optimized float-to-integer conversion
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+// The implementation of __aeabi_f2uiz() expects to tail call __internal_f2iz()
+//  with the flags register set for unsigned conversion.  The __internal_f2iz()
+//  symbol itself is unambiguous, but there is a remote risk that the linker
+//  will prefer some other symbol in place of __aeabi_f2iz().  Importing an
+//  archive file that exports __aeabi_f2iz() will throw an error in this case.
+// As a workaround, this block configures __aeabi_f2iz() for compilation twice.
+// The first version configures __internal_f2iz() as a WEAK standalone symbol,
+//  and the second exports __aeabi_f2iz() and __internal_f2iz() normally.
+// A small bonus: programs only using __aeabi_f2uiz() will be slightly smaller.
+// '_internal_fixsfsi' should appear before '_arm_fixsfsi' in LIB1ASMFUNCS.
+#if defined(L_arm_fixsfsi) || \
+   (defined(L_internal_fixsfsi) && \
+  !(defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__))
+
+// Subsection ordering within fpcore keeps conditional branches within range.
+#define F2IZ_SECTION .text.sorted.libgcc.fpcore.r.fixsfsi
+
+// int __aeabi_f2iz(float)
+// Converts a float in $r0 to signed integer, rounding toward 0.
+// Values out of range are forced to either INT_MAX or INT_MIN.
+// NAN becomes zero.
+#ifdef L_arm_fixsfsi
+FUNC_START_SECTION aeabi_f2iz F2IZ_SECTION
+FUNC_ALIAS fixsfsi aeabi_f2iz
+CFI_START_FUNCTION
+#endif
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+// Flag for unsigned conversion.
+movsr1, #33
+b   SYM(__internal_fixsfdi)
+
+  #else /* !__OPTIMIZE_SIZE__ */
+
+#ifdef L_arm_fixsfsi
+// Flag for signed conversion.
+movsr3, #1
+
+// [unsigned] int internal_f2iz(float, int)
+// Internal function expects a boolean flag in $r1.
+// If the boolean flag is 0, the result is unsigned.
+// If the boolean flag is 1, the result is signed.
+FUNC_ENTRY internal_f2iz
+
+#else /* L_internal_fixsfsi */
+WEAK_STAR

[PATCH v5 33/33] Drop single-precision Thumb-1 soft-float functions

2021-01-15 Thread Daniel Engel

 function
symbols first: _subQQ.o, _cmpQQ.o, etc.  The fixed-point archive elements
appear after the _arm_* archive elements, so the initial definitions of the
floating point functions are discarded.  However, the fixed-point functions
contain unresolved symbol references which the linker registers progressively.

Given that the default libgcc.a does not build the soft-point library [1],
the linker cannot import any floating point objects until the second pass.

However, when v6-m/nofp/libgcc.a _does_ include the soft-point library, the
linker proceeds to import some floating point objects during the first pass.

To test this theory, add explicit symbol references to convert-sat.c:

--- a/gcc/testsuite/gcc.dg/fixed-point/convert-sat.c
+++ b/gcc/testsuite/gcc.dg/fixed-point/convert-sat.c
@@ -11,6 +11,12 @@ extern void abort (void);

 int main ()
 {
+  volatile float a = 1.0;
+  volatile float b = 2.0;
+  volatile float c = a * b;
+  volatile double d = a;
+  volatile int e = a;
+
   SAT_CONV1 (short _Accum, hk);
   SAT_CONV1 (_Accum, k);
   SAT_CONV1 (long _Accum, lk);

Afterwards, the linker imports the expected symbols:
...
==> (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_mulsf3.o
==> (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_muldi3.o
==> (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fixsfsi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_f2d.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fp_exceptionf.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fp_assemblef.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fp_normalizef.o
...
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)muldf3.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)fixdfsi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_clzsi2.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fixunssfsi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fcmpge.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fcmple.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fixsfdi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fixunssfdi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_cmpdf2.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixunsdfsi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixdfdi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixunsdfdi.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)eqdf2.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)gedf2.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)ledf2.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)subdf3.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)floatunsidf.o
(/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_cmpsf2.o
...

At a minimum this behavior results in the use of non-preferred code in an
affected application.  However, as long as each object exports a single
entry point, this does not automatically result in a build failure.

Indeed, in the case of __aeabi_fmul() and __aeabi_f2d(), all references seem
to resolve uniformly in favor of the soft-float library.  The first pass that
imports the soft-float version of __aeabi_f2iz() also succeeds.

However, the first pass fails to find __aeabi_f2uiz(), since the soft-float
library does not implement this variant.  So, this symbol remains undefined
until the second pass.  However, the assembly version of __aeabi_f2uiz()
the linker finds happens to be implemented as a branch to __internal_f2iz() [2].
But the linker, importing __internal_f2iz(), also finds the main entry point
__aeabi_f2iz().  And, since __aeabi_f2iz() was already found in the soft-float
library, the linker throws an error.

The solution is two-fold.  First, the assembly routines have separately been
made robust against this potential error condition (by weakening and splitting
symbols).  Second, this commit to block single-precision functions from the
soft-float library makes it impossible for the linker to select a non-preferred
version.  Two duplicate symbols remain (extendsfdf2) and (truncdfsf2), but the
situation is much improved.

[1] softfp_wrap_start = "#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1"

[2] (These operations share a substantial portion of their code path, so this
choice leads to a size reduction in programs that use both functions.)

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/t-softfp (softfp_float_modes): Added as "df".
---
 libgcc/config/arm/t-softfp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libgcc/config/arm/t-softfp b/libgcc/config/arm/t-softfp
index 554ec9bc47b..bd6a4642e5f

[PATCH v5 32/33] Import float<->__fp16 conversion from the CM0 library

2021-01-15 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/eabi/fcast.S (__aeabi_h2f, __aeabi_f2h): Added functions.
* config/arm/fp16 (__gnu_f2h_ieee, __gnu_h2f_ieee, 
__gnu_f2h_alternative,
__gnu_h2f_alternative): Disable build for v6m multilibs.
* config/arm/t-bpabi (LIB1ASMFUNCS): Added _aeabi_f2h_ieee,
_aeabi_h2f_ieee, _aeabi_f2h_alt, and _aeabi_h2f_alt (v6m only).
---
 libgcc/config/arm/eabi/fcast.S | 277 +
 libgcc/config/arm/fp16.c   |   4 +
 libgcc/config/arm/t-bpabi  |   7 +
 3 files changed, 288 insertions(+)

diff --git a/libgcc/config/arm/eabi/fcast.S b/libgcc/config/arm/eabi/fcast.S
index b1184ee1d53..e5a34d69578 100644
--- a/libgcc/config/arm/eabi/fcast.S
+++ b/libgcc/config/arm/eabi/fcast.S
@@ -254,3 +254,280 @@ FUNC_END D2F_NAME
 
 #endif /* L_arm_d2f || L_arm_truncdfsf2 */
 
+
+#if defined(L_aeabi_h2f_ieee) || defined(L_aeabi_h2f_alt)
+
+#ifdef L_aeabi_h2f_ieee
+  #define H2F_NAME aeabi_h2f
+  #define H2F_ALIAS gnu_h2f_ieee
+#else
+  #define H2F_NAME aeabi_h2f_alt
+  #define H2F_ALIAS gnu_h2f_alternative
+#endif
+
+// float __aeabi_h2f(short hf)
+// float __aeabi_h2f_alt(short hf)
+// Converts a half-precision float in $r0 to single-precision.
+// Rounding, overflow, and underflow conditions are impossible.
+// In IEEE mode, INF, ZERO, and NAN are returned unmodified.
+FUNC_START_SECTION H2F_NAME .text.sorted.libgcc.h2f
+FUNC_ALIAS H2F_ALIAS H2F_NAME
+CFI_START_FUNCTION
+
+// Set up registers for __fp_normalize2().
+push{ rT, lr }
+.cfi_remember_state
+.cfi_adjust_cfa_offset 8
+.cfi_rel_offset rT, 0
+.cfi_rel_offset lr, 4
+
+// Save the mantissa and exponent.
+lslsr2, r0, #17
+
+// Isolate the sign.
+lsrsr0, #15
+lslsr0, #31
+
+// Align the exponent at bit[24] for normalization.
+// If zero, return the original sign.
+lsrsr2, #3
+
+  #ifdef __HAVE_FEATURE_IT
+do_it   eq
+RETc(eq)
+  #else
+beq LLSYM(__h2f_return)
+  #endif
+
+// Split the exponent and mantissa into separate registers.
+// This is the most efficient way to convert subnormals in the
+//  half-precision form into normals in single-precision.
+// This does add a leading implicit '1' to INF and NAN,
+//  but that will be absorbed when the value is re-assembled.
+bl  SYM(__fp_normalize2) __PLT__
+
+   #ifdef L_aeabi_h2f_ieee
+// Set up the exponent bias.  For INF/NAN values, the bias is 223,
+//  where the last '1' accounts for the implicit '1' in the mantissa.
+addsr2, #(255 - 31 - 1)
+
+// Test for INF/NAN.
+cmp r2, #254
+
+  #ifdef __HAVE_FEATURE_IT
+do_it   ne
+  #else
+beq LLSYM(__h2f_assemble)
+  #endif
+
+// For normal values, the bias should have been 111.
+// However, this offset must be adjusted per the INF check above.
+ IT(sub,ne) r2, #((255 - 31 - 1) - (127 - 15 - 1))
+
+#else /* L_aeabi_h2f_alt */
+// Set up the exponent bias.  All values are normal.
+addsr2, #(127 - 15 - 1)
+#endif
+
+LLSYM(__h2f_assemble):
+// Combine exponent and sign.
+lslsr2, #23
+addsr0, r2
+
+// Combine mantissa.
+lsrsr3, #8
+add r0, r3
+
+LLSYM(__h2f_return):
+pop { rT, pc }
+.cfi_restore_state
+
+CFI_END_FUNCTION
+FUNC_END H2F_NAME
+FUNC_END H2F_ALIAS
+
+#endif /* L_aeabi_h2f_ieee || L_aeabi_h2f_alt */
+
+
+#if defined(L_aeabi_f2h_ieee) || defined(L_aeabi_f2h_alt)
+
+#ifdef L_aeabi_f2h_ieee
+  #define F2H_NAME aeabi_f2h
+  #define F2H_ALIAS gnu_f2h_ieee
+#else
+  #define F2H_NAME aeabi_f2h_alt
+  #define F2H_ALIAS gnu_f2h_alternative
+#endif
+
+// short __aeabi_f2h(float f)
+// short __aeabi_f2h_alt(float f)
+// Converts a single-precision float in $r0 to half-precision,
+//  rounding to nearest, ties to even.
+// Values out of range are forced to either ZERO or INF.
+// In IEEE mode, the upper 12 bits of a NAN will be preserved.
+FUNC_START_SECTION F2H_NAME .text.sorted.libgcc.f2h
+FUNC_ALIAS F2H_ALIAS F2H_NAME
+CFI_START_FUNCTION
+
+// Set up the sign.
+lsrsr2, r0, #31
+lslsr2, #15
+
+// Save the exponent and mantissa.
+// If ZERO, return the original sign.
+lslsr0, #1
+
+  #ifdef __HAVE_FEATURE_IT
+do_it   ne,t
+addne   r0, r2
+RETc(ne)
+  #else
+beq LLSYM(__f2h_return)
+  #endif
+
+// Isolate the exponent.
+lsrsr1, r0, #24
+
+  #ifdef L_aeabi_f2h_ieee
+// Check for NAN.
+cmp r1, #255
+

Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

2021-01-15 Thread Daniel Engel

Hi Christophe,

On Mon, Jan 11, 2021, at 8:39 AM, Christophe Lyon wrote:
> On Mon, 11 Jan 2021 at 17:18, Daniel Engel  wrote:
> >
> > On Mon, Jan 11, 2021, at 8:07 AM, Christophe Lyon wrote:
> > > On Sat, 9 Jan 2021 at 14:09, Christophe Lyon  
> > > wrote:
> > > >
> > > > On Sat, 9 Jan 2021 at 13:27, Daniel Engel  
> > > > wrote:
> > > > >
> > > > > On Thu, Jan 7, 2021, at 4:56 AM, Richard Earnshaw wrote:
> > > > > > On 07/01/2021 00:59, Daniel Engel wrote:
> > > > > > > --snip--
> > > > > > >
> > > > > > > On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote:
> > > > > > > --snip--
> > > > > > >
> > > > > > >> - finally, your popcount implementations have data in the code 
> > > > > > >> segment.
> > > > > > >>  That's going to cause problems when we have compilation options 
> > > > > > >> such as
> > > > > > >> -mpure-code.
> > > > > > >
> > > > > > > I am just following the precedent of existing lib1funcs (e.g. 
> > > > > > > __clz2si).
> > > > > > > If this matters, you'll need to point in the right direction for 
> > > > > > > the
> > > > > > > fix.  I'm not sure it does matter, since these functions are PIC 
> > > > > > > anyway.
> > > > > >
> > > > > > That might be a bug in the clz implementations - Christophe: Any 
> > > > > > thoughts?
> > > > >
> > > > > __clzsi2() has test coverage in 
> > > > > "gcc.c-torture/execute/builtin-bitops-1.c"
> > > > Thanks, I'll have a closer look at why I didn't see problems.
> > > >
> > >
> > > So, that's because the code goes to the .text section (as opposed to
> > > .text.noread)
> > > and does not have the PURECODE flag. The compiler takes care of this
> > > when generating code with -mpure-code.
> > > And the simulator does not complain because it only checks loads from
> > > the segment with the PURECODE flag set.
> > >
> > This is far out of my depth, but can something like:
> >
> > ifeq (,$(findstring __symbian__,$(shell $(gcc_compile_bare) -dM -E - 
> >  >
> > be adapted to:
> >
> > a) detect the state of the -mpure-code switch, and
> > b) pass that flag to the preprocessor?
> >
> > If so, I can probably fix both the target section and the data usage.
> > Just have to add a few instructions to finish unrolling the loop.
> 
> I must confess I never checked libgcc's Makefile deeply before,
> but it looks like you can probably detect whether -mpure-code is
> part of $CFLAGS.
> 
> However, it might be better to write pure-code-safe code
> unconditionally because the toolchain will probably not
> be rebuilt with -mpure-code as discussed before.
> Or that could mean adding a -mpure-code multilib

I have learned a few things since the last update.  I think I know how
to get -mpure-code out of CFLAGS and into a macro.  However, I have hit
something of a wall with testing.  I can't seem to compile any flavor of
libgcc with CFLAGS_FOR_TARGET="-mpure-code".

1.  Configuring --with-multilib-list=rmprofile results in build failure:

checking for suffix of object files... configure: error: in 
`/home/mirdan/gcc-obj/arm-none-eabi/libgcc':
configure: error: cannot compute suffix of object files: cannot compile
See `config.log' for more details

   cc1: error: -mpure-code only supports non-pic code on M-profile targets
   
2.  Attempting to filter the multib list results in configuration error.
This might have been misguided, but it was something I tried:

Error: --with-multilib-list=armv6s-m not supported.

Error: --with-multilib-list=mthumb/march=armv6s-m/mfloat-abi=soft not 
supported

3.  Attempting to configure a single architecture results in a build error.  

--with-mode=thumb --with-arch=armv6s-m --with-float=soft

checking for suffix of object files... configure: error: in 
`/home/mirdan/gcc-obj/arm-none-eabi/arm/autofp/v5te/fpu/libgcc':
configure: error: cannot compute suffix of object files: cannot compile
See `config.log' for more details

conftest.c:9:10: fatal error: ac_nonexistent.h: No such file or directory
9 | #include 
  |  ^~

This has me wondering whether pure-code in libgcc is a real issue ... 
If there's a wa

Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

2021-01-16 Thread Daniel Engel

Hi Christophe,

On Fri, Jan 15, 2021, at 4:30 AM, Christophe Lyon wrote:
> On Fri, 15 Jan 2021 at 12:39, Daniel Engel  wrote:
> >
> > Hi Christophe,
> >
> > On Mon, Jan 11, 2021, at 8:39 AM, Christophe Lyon wrote:
> > > On Mon, 11 Jan 2021 at 17:18, Daniel Engel  wrote:
> > > >
> > > > On Mon, Jan 11, 2021, at 8:07 AM, Christophe Lyon wrote:
> > > > > On Sat, 9 Jan 2021 at 14:09, Christophe Lyon 
> > > > >  wrote:
> > > > > >
> > > > > > On Sat, 9 Jan 2021 at 13:27, Daniel Engel  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Thu, Jan 7, 2021, at 4:56 AM, Richard Earnshaw wrote:
> > > > > > > > On 07/01/2021 00:59, Daniel Engel wrote:
> > > > > > > > > --snip--
> > > > > > > > >
> > > > > > > > > On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote:
> > > > > > > > > --snip--
> > > > > > > > >
> > > > > > > > >> - finally, your popcount implementations have data in the 
> > > > > > > > >> code segment.
> > > > > > > > >>  That's going to cause problems when we have compilation 
> > > > > > > > >> options such as
> > > > > > > > >> -mpure-code.
> > > > > > > > >
> > > > > > > > > I am just following the precedent of existing lib1funcs (e.g. 
> > > > > > > > > __clz2si).
> > > > > > > > > If this matters, you'll need to point in the right direction 
> > > > > > > > > for the
> > > > > > > > > fix.  I'm not sure it does matter, since these functions are 
> > > > > > > > > PIC anyway.
> > > > > > > >
> > > > > > > > That might be a bug in the clz implementations - Christophe: 
> > > > > > > > Any thoughts?
> > > > > > >
> > > > > > > __clzsi2() has test coverage in 
> > > > > > > "gcc.c-torture/execute/builtin-bitops-1.c"
> > > > > > Thanks, I'll have a closer look at why I didn't see problems.
> > > > > >
> > > > >
> > > > > So, that's because the code goes to the .text section (as opposed to
> > > > > .text.noread)
> > > > > and does not have the PURECODE flag. The compiler takes care of this
> > > > > when generating code with -mpure-code.
> > > > > And the simulator does not complain because it only checks loads from
> > > > > the segment with the PURECODE flag set.
> > > > >
> > > > This is far out of my depth, but can something like:
> > > >
> > > > ifeq (,$(findstring __symbian__,$(shell $(gcc_compile_bare) -dM -E - 
> > > >  > > >
> > > > be adapted to:
> > > >
> > > > a) detect the state of the -mpure-code switch, and
> > > > b) pass that flag to the preprocessor?
> > > >
> > > > If so, I can probably fix both the target section and the data usage.
> > > > Just have to add a few instructions to finish unrolling the loop.
> > >
> > > I must confess I never checked libgcc's Makefile deeply before,
> > > but it looks like you can probably detect whether -mpure-code is
> > > part of $CFLAGS.
> > >
> > > However, it might be better to write pure-code-safe code
> > > unconditionally because the toolchain will probably not
> > > be rebuilt with -mpure-code as discussed before.
> > > Or that could mean adding a -mpure-code multilib
> >
> > I have learned a few things since the last update.  I think I know how
> > to get -mpure-code out of CFLAGS and into a macro.  However, I have hit
> > something of a wall with testing.  I can't seem to compile any flavor of
> > libgcc with CFLAGS_FOR_TARGET="-mpure-code".
> >
> > 1.  Configuring --with-multilib-list=rmprofile results in build failure:
> >
> > checking for suffix of object files... configure: error: in 
> > `/home/mirdan/gcc-obj/arm-none-eabi/libgcc':
> > configure: error: cannot compute suffix of object files: cannot compile
> > See `config.log' for more details
> >
> >cc1: error: -mpure-code only

[PING] Re: [PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0

2022-11-15 Thread Daniel Engel

Hello, 

Is there still any interest in merging this patch? 

Thanks,
Daniel


On Mon, Oct 31, 2022, at 8:44 AM, Daniel Engel wrote:
> Hi Richard,
>
> I am re-submitting my libgcc patch from 2021:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html
>
> I believe I have finally made the stage1 window. 
>
> Regards,
> Daniel
>
> ---
>
> Changes since v6:
>
> * Rebased and tested with gcc-13
>
> There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}.
> Clean master:
>
> # of expected passes529397
> # of unexpected failures41160
> # of unexpected successes   12
> # of expected failures  3442
> # of unresolved testcases   978
> # of unsupported tests  28993
>
> Patched master:
>
> # of expected passes529397
> # of unexpected failures41160
> # of unexpected successes   12
> # of expected failures  3442
> # of unresolved testcases   978
> # of unsupported tests  28993
>
> ---
>
> This patch series adds an assembly-language implementation of IEEE-754 
> compliant
> single-precision functions designed for the Cortex M0 (v6m) architecture.  
> There
> are improvements to most of the EABI integer functions as well.  This is the
> ibgcc component of a larger library project originally proposed in 2018:
>
> https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html
>
> As one point of comparison, a test program [1] links 916 bytes from libgcc 
> with
> the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 
> toolchain.
> That's a 90% size reduction.
>
> I have extensive test vectors [2], and this patch pass all tests on an 
> STM32F051.
> These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 
> [5], plus
> many of my own generation.
>
> There may be some follow-on projects worth discussing:
>
> * The library is currently integrated into the ARM v6s-m multilib only.  
> It
> is likely that some other architectures would benefit from these routines.
> However, I have NOT profiled the existing implementations (ieee754-sf.S) 
> to
> estimate where improvements may be found.
>
> * GCC currently lacks test for some functions, such as 
> __aeabi_[u]ldivmod().
> There may be useful bits in [1] that can be integrated.
>
> On Cortex M0, the library has (approximately) the following properties:
>
> Function(s) Size (bytes)Cycles  
> Stack   Accuracy
> __clzsi250  20  
> 0   exact
> __clzsi2 (OPTIMIZE_SIZE)22  51  
> 0   exact
> __clzdi28+__clzsi2  4+__clzsi2  
> 0   exact
>
> __clrsbsi2  8+__clzsi2  6+__clzsi2  
> 0   exact
> __clrsbdi2  18+__clzsi2 (8..10)+__clzsi2
> 0   exact
>
> __ctzsi252  21  
> 0   exact
> __ctzsi2 (OPTIMIZE_SIZE)24  52  
> 0   exact
> __ctzdi28+__ctzsi2  5+__ctzsi2  
> 0   exact
>
> __ffssi28   6..(5+__ctzsi2) 
> 0   exact
> __ffsdi214+__ctzsi2 9..(8+__ctzsi2) 
> 0   exact
>
> __popcountsi2   52  25  
> 0   exact
> __popcountsi2 (OPTIMIZE_SIZE)   14  9..201  
> 0   exact
> __popcountdi2   34+__popcountsi246  
> 0   exact
> __popcountdi2 (OPTIMIZE_SIZE)   12+__popcountsi217..401 
> 0   exact
>
> __paritysi2 24  14  
> 0   exact
> __paritysi2 (OPTIMIZE_SIZE) 16  38  
> 0   exact
> __paritydi2 2+__paritysi2   1+__paritysi2   
> 0   exact
>
> __umulsidi3 44  24  
> 0   exact
> __mulsidi3  30+__umulsidi3  24+__umulsidi3  
> 8   exact
> __muldi3 (__aeabi_lmul) 10+__umulsidi3  6+__umulsidi3   
> 0   exact
> __ashldi3 (__aeabi_llsl)22  13  
> 0   exact
> __lshrdi3 (__aeabi_llsr)22  13  
&g

[PATCH v6 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0

2021-12-27 Thread Daniel Engel

) 54+__udivdi336+__udivdi332  
< 1 lsb

__shared_float  178
__shared_float (OPTIMIZE_SIZE)  154

__addsf3 (__aeabi_fadd) 116+__shared_float  31..76  8   
<= 0.5 ulp
__addsf3 (OPTIMIZE_SIZE)112+__shared_float  74  8   
<= 0.5 ulp
__subsf3 (__aeabi_fsub) 6+__addsf3  3+__addsf3  8   
<= 0.5 ulp
__aeabi_frsub   8+__addsf3  6+__addsf3  8   
<= 0.5 ulp
__mulsf3 (__aeabi_fmul) 112+__shared_float  73..97  8   
<= 0.5 ulp
__mulsf3 (OPTIMIZE_SIZE)96+__shared_float   93  8   
<= 0.5 ulp
__divsf3 (__aeabi_fdiv) 132+__shared_float  83..361 8   
<= 0.5 ulp
__divsf3 (OPTIMIZE_SIZE)120+__shared_float  263..3598   
<= 0.5 ulp

__cmpsf2/__lesf2/__ltsf272  33  0   
exact
__eqsf2/__nesf2 4+__cmpsf2  3+__cmpsf2  0   
exact
__gesf2/__gesf2 4+__cmpsf2  3+__cmpsf2  0   
exact
__unordsf2 (__aeabi_fcmpun) 4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmpeq  4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmpne  4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmplt  4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmple  4+__cmpsf2  3+__cmpsf2  0   
exact
__aeabi_fcmpge  4+__cmpsf2  3+__cmpsf2  0   
exact

__floatundisf (__aeabi_ul2f)14+__shared_float   40..81  8   
<= 0.5 ulp
__floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40..237 8   
<= 0.5 ulp
__floatunsisf (__aeabi_ui2f)0+__floatundisf 1+__floatundisf 8   
<= 0.5 ulp
__floatdisf (__aeabi_l2f)   14+__floatundisf7+__floatundisf 8   
<= 0.5 ulp
__floatsisf (__aeabi_i2f)   0+__floatdisf   1+__floatdisf   8   
<= 0.5 ulp

__fixsfdi (__aeabi_f2lz)74  27..33  0   
exact
__fixunssfdi (__aeabi_f2ulz)4+__fixsfdi 3+__fixsfdi 0   
exact
__fixsfsi (__aeabi_f2iz)52  19  0   
exact
__fixsfsi (OPTIMIZE_SIZE)   4+__fixsfdi 3+__fixsfdi 0   
exact
__fixunssfsi (__aeabi_f2uiz)4+__fixsfsi 3+__fixsfsi 0   
exact

__extendsfdf2 (__aeabi_f2d) 42+__shared_float   38  8   
exact
__truncsfdf2 (__aeabi_f2d)  88  34  8   
exact
__aeabi_d2f 56+__shared_float   54..58  8   
<= 0.5 ulp
__aeabi_h2f 34+__shared_float   34  8   
exact
__aeabi_f2h 84  23..34  0   
<= 0.5 ulp

Copyright assignment is on file with the FSF.

Thanks,
Daniel Engel


[1] // Test program for size comparison

extern int main (void)
{
volatile int x = 1;
volatile unsigned long long int y = 10;
volatile long long int z = x / y; // 64-bit division

volatile float a = x; // 32-bit casting
volatile float b = y; // 64 bit casting
volatile float c = z / b; // float division
volatile float d = a + c; // float addition
volatile float e = c * b; // float multiplication
volatile float f = d - e - c; // float subtraction

if (f != c) // float comparison
y -= (long long int)d; // float casting
}

[2] http://danielengel.com/cm0_test_vectors.tgz
[3] http://www.netlib.org/fp/ucbtest.tgz
[4] http://www.jhauser.us/arithmetic/TestFloat.html
[5] http://win-www.uia.ac.be/u/cant/ieeecc754.html

-- 
2.25.1

[PATCH v6 01/34] Add and restructure function declaration macros

2021-12-27 Thread Daniel Engel

Most of these changes support subsequent patches in this series.
Particularly, the FUNC_START macro becomes part of a new macro chain:

  * FUNC_ENTRY  Common global symbol directives
  * FUNC_START_SECTION  FUNC_ENTRY to start a new 
  * FUNC_START  FUNC_START_SECTION <".text">

The effective definition of FUNC_START is unchanged from the previous
version of lib1funcs.  See code comments for detailed usage.

The new names FUNC_ENTRY and FUNC_START_SECTION were chosen specifically
to complement the existing FUNC_START name.  Alternate name patterns are
possible (such as {FUNC_SYMBOL, FUNC_START_SECTION, FUNC_START_TEXT}),
but any change to FUNC_START would require refactoring much of libgcc.

Additionally, a parallel chain of new macros supports weak functions:

  * WEAK_ENTRY
  * WEAK_START_SECTION
  * WEAK_START
  * WEAK_ALIAS

Moving the CFI_* macros earlier in the file scope will increase their
scope for use in additional functions.

gcc/libgcc/ChangeLog:
2021-01-14 Daniel Engel 

* config/arm/lib1funcs.S:
(LLSYM): New macro prefix ".L" for strippable local symbols.
(CFI_START_FUNCTION, CFI_END_FUNCTION): Moved earlier in the file.
(FUNC_ENTRY): New macro for symbols with no ".section" directive.
(WEAK_ENTRY): New macro FUNC_ENTRY + ".weak".
(FUNC_START_SECTION): New macro FUNC_ENTRY with  argument.
(WEAK_START_SECTION): New macro FUNC_START_SECTION + ".weak".
(FUNC_START): Redefined in terms of FUNC_START_SECTION <".text">.
(WEAK_START): New macro FUNC_START + ".weak".
(WEAK_ALIAS): New macro FUNC_ALIAS + ".weak".
(FUNC_END): Moved after FUNC_START macro group.
(THUMB_FUNC_START): Moved near the other *FUNC* macros.
(THUMB_SYNTAX, ARM_SYM_START, SYM_END): Deleted unused macros.
---
 libgcc/config/arm/lib1funcs.S | 109 +-
 1 file changed, 69 insertions(+), 40 deletions(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index c2fcfc503ec..f14662d7e15 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -69,11 +69,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define TYPE(x) .type SYM(x),function
 #define SIZE(x) .size SYM(x), . - SYM(x)
 #define LSYM(x) .x
+#define LLSYM(x) .L##x
 #else
 #define __PLT__
 #define TYPE(x)
 #define SIZE(x)
 #define LSYM(x) x
+#define LLSYM(x) x
 #endif
 
 /* Function end macros.  Variants for interworking.  */
@@ -182,6 +184,16 @@ LSYM(Lend_fde):
 #endif
 .endm
 
+.macro CFI_START_FUNCTION
+   .cfi_startproc
+   .cfi_remember_state
+.endm
+
+.macro CFI_END_FUNCTION
+   .cfi_restore_state
+   .cfi_endproc
+.endm
+
 /* Don't pass dirn, it's there just to get token pasting right.  */
 
 .macro RETLDM  regs=, cond=, unwind=, dirn=ia
@@ -324,10 +336,6 @@ LSYM(Lend_fde):
 .endm
 #endif
 
-.macro FUNC_END name
-   SIZE (__\name)
-.endm
-
 .macro DIV_FUNC_END name signed
cfi_start   __\name, LSYM(Lend_div0)
 LSYM(Ldiv0):
@@ -340,48 +348,76 @@ LSYM(Ldiv0):
FUNC_END \name
 .endm
 
-.macro THUMB_FUNC_START name
-   .globl  SYM (\name)
-   TYPE(\name)
-   .thumb_func
-SYM (\name):
-.endm
-
 /* Function start macros.  Variants for ARM and Thumb.  */
 
 #ifdef __thumb__
 #define THUMB_FUNC .thumb_func
 #define THUMB_CODE .force_thumb
-# if defined(__thumb2__)
-#define THUMB_SYNTAX
-# else
-#define THUMB_SYNTAX
-# endif
 #else
 #define THUMB_FUNC
 #define THUMB_CODE
-#define THUMB_SYNTAX
 #endif
 
+.macro THUMB_FUNC_START name
+   .globl  SYM (\name)
+   TYPE(\name)
+   .thumb_func
+SYM (\name):
+.endm
+
+/* Strong global symbol, ".text" section.
+   The default macro for function declarations. */
 .macro FUNC_START name
-   .text
+   FUNC_START_SECTION \name .text
+.endm
+
+/* Weak global symbol, ".text" section.
+   Use WEAK_* macros to declare a function/object that may be discarded in by
+the linker when another library or object exports the same name.
+   Typically, functions declared with WEAK_* macros implement a subset of
+functionality provided by the overriding definition, and are discarded
+when the full functionality is required. */
+.macro WEAK_START name
+   .weak SYM(__\name)
+   FUNC_START_SECTION \name .text
+.endm
+
+/* Strong global symbol, alternate section.
+   Use the *_START_SECTION macros for declarations that the linker should
+place in a non-defailt section (e.g. ".rodata", ".text.subsection"). */
+.macro FUNC_START_SECTION name section
+   .section \section,"x"
+   .align 0
+   FUNC_ENTRY \name
+.endm
+
+/* Weak global symbol, alternate section. */
+.macro WEAK_START_SECTION name section
+   .weak SYM(__\name)
+   FUNC_START_SECTION \name

[PATCH v6 02/34] Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY

2021-12-27 Thread Daniel Engel

Since THUMB_FUNC_START does not insert the ".text" directive, it aligns
more closely with the new FUNC_ENTRY maro and is renamed accordingly.

THUMB_FUNC_START usage has been universally synonymous with the
".force_thumb" directive, so this is now folded into the definition.
Usage of ".force_thumb" and ".thumb_func" is now tightly coupled
throughout the "arm" subdirectory.

gcc/libgcc/ChangeLog:
2021-01-14 Daniel Engel 

* config/arm/lib1funcs.S: (THUMB_FUNC_START): Renamed to ...
(THUMB_FUNC_ENTRY): for consistency; also added ".force_thumb".
(_call_via_r0): Removed redundant preceding ".force_thumb".
(__gnu_thumb1_case_sqi, __gnu_thumb1_case_uqi, __gnu_thumb1_case_shi,
__gnu_thumb1_case_si): Removed redundant ".force_thumb" and ".syntax".
---
 libgcc/config/arm/lib1funcs.S | 32 +++-
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index f14662d7e15..65d070d8178 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -358,10 +358,11 @@ LSYM(Ldiv0):
 #define THUMB_CODE
 #endif
 
-.macro THUMB_FUNC_START name
+.macro THUMB_FUNC_ENTRY name
.globl  SYM (\name)
TYPE(\name)
.thumb_func
+   .force_thumb
 SYM (\name):
 .endm
 
@@ -1944,10 +1945,9 @@ ARM_FUNC_START ctzsi2

.text
.align 0
-.force_thumb
 
 .macro call_via register
-   THUMB_FUNC_START _call_via_\register
+   THUMB_FUNC_ENTRY _call_via_\register
 
bx  \register
nop
@@ -2030,7 +2030,7 @@ _arm_return_r11:
 .macro interwork_with_frame frame, register, name, return
.code   16
 
-   THUMB_FUNC_START \name
+   THUMB_FUNC_ENTRY \name
 
bx  pc
nop
@@ -2047,7 +2047,7 @@ _arm_return_r11:
 .macro interwork register
.code   16
 
-   THUMB_FUNC_START _interwork_call_via_\register
+   THUMB_FUNC_ENTRY _interwork_call_via_\register
 
bx  pc
nop
@@ -2084,7 +2084,7 @@ LSYM(Lchange_\register):
/* The LR case has to be handled a little differently...  */
.code 16
 
-   THUMB_FUNC_START _interwork_call_via_lr
+   THUMB_FUNC_ENTRY _interwork_call_via_lr
 
bx  pc
nop
@@ -2112,9 +2112,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_sqi
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_sqi
push{r1}
mov r1, lr
lsrsr1, r1, #1
@@ -2131,9 +2129,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_uqi
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_uqi
push{r1}
mov r1, lr
lsrsr1, r1, #1
@@ -2150,9 +2146,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_shi
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_shi
push{r0, r1}
mov r1, lr
lsrsr1, r1, #1
@@ -2170,9 +2164,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_uhi
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_uhi
push{r0, r1}
mov r1, lr
lsrsr1, r1, #1
@@ -2190,9 +2182,7 @@ LSYM(Lchange_\register):

.text
.align 0
-.force_thumb
-   .syntax unified
-   THUMB_FUNC_START __gnu_thumb1_case_si
+   THUMB_FUNC_ENTRY __gnu_thumb1_case_si
push{r0, r1}
mov r1, lr
adds.n  r1, r1, #2  /* Align to word.  */
-- 
2.25.1

[PATCH v6 03/34] Fix syntax warnings on conditional instructions

2021-12-27 Thread Daniel Engel

gcc/libgcc/ChangeLog:
2021-01-14 Daniel Engel 

* config/arm/lib1funcs.S (RETLDM, ARM_DIV_BODY, ARM_MOD_BODY,
_interwork_call_via_lr): Moved condition code after the flags
update specifier "s".
(ARM_FUNC_START, THUMB_LDIV0): Removed redundant ".syntax".
---
 libgcc/config/arm/lib1funcs.S | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 65d070d8178..b8693be8e4f 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -204,7 +204,7 @@ LSYM(Lend_fde):
 # if defined(__thumb2__)
pop\cond{\regs, lr}
 # else
-   ldm\cond\dirn   sp!, {\regs, lr}
+   ldm\dirn\cond   sp!, {\regs, lr}
 # endif
.endif
.ifnc "\unwind", ""
@@ -220,7 +220,7 @@ LSYM(Lend_fde):
 # if defined(__thumb2__)
pop\cond{\regs, pc}
 # else
-   ldm\cond\dirn   sp!, {\regs, pc}
+   ldm\dirn\cond   sp!, {\regs, pc}
 # endif
.endif
 #endif
@@ -292,7 +292,6 @@ LSYM(Lend_fde):
pop {r1, pc}
 
 #elif defined(__thumb2__)
-   .syntax unified
.ifc \signed, unsigned
cbz r0, 1f
mov r0, #0x
@@ -429,7 +428,6 @@ SYM (__\name):
 /* For Thumb-2 we build everything in thumb mode.  */
 .macro ARM_FUNC_START name
FUNC_START \name
-   .syntax unified
 .endm
 #define EQUIV .thumb_set
 .macro  ARM_CALL name
@@ -643,7 +641,7 @@ pc  .reqr15
orrhs   \result,   \result,   \curbit,  lsr #3
cmp \dividend, #0   @ Early termination?
do_it   ne, t
-   movnes  \curbit,   \curbit,  lsr #4 @ No, any more bits to do?
+   movsne  \curbit,   \curbit,  lsr #4 @ No, any more bits to do?
movne   \divisor,  \divisor, lsr #4
bne 1b
 
@@ -745,7 +743,7 @@ pc  .reqr15
subhs   \dividend, \dividend, \divisor, lsr #3
cmp \dividend, #1
mov \divisor, \divisor, lsr #4
-   subges  \order, \order, #4
+   subsge  \order, \order, #4
bge 1b
 
tst \order, #3
@@ -2093,7 +2091,7 @@ LSYM(Lchange_\register):
.globl .Lchange_lr
 .Lchange_lr:
tst lr, #1
-   stmeqdb r13!, {lr, pc}
+   stmdbeq r13!, {lr, pc}
mov ip, lr
adreq   lr, _arm_return
bx  ip
-- 
2.25.1

[PATCH v6 04/34] Reorganize LIB1ASMFUNCS object wrapper macros

2021-12-27 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2021-01-14 Daniel Engel 

* config/arm/t-elf (LIB1ASMFUNCS): Split macros into logical groups.
---
 libgcc/config/arm/t-elf | 66 +
 1 file changed, 53 insertions(+), 13 deletions(-)

diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 9da6cd37054..93ea1cd8f76 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -14,19 +14,59 @@ LIB1ASMFUNCS += _arm_muldf3 _arm_mulsf3
 endif
 endif # !__symbian__
 
-# For most CPUs we have an assembly soft-float implementations.
-# However this is not true for ARMv6M.  Here we want to use the soft-fp C
-# implementation.  The soft-fp code is only build for ARMv6M.  This pulls
-# in the asm implementation for other CPUs.
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func \
-   _call_via_rX _interwork_call_via_rX \
-   _lshrdi3 _ashrdi3 _ashldi3 \
-   _arm_negdf2 _arm_addsubdf3 _arm_muldivdf3 _arm_cmpdf2 _arm_unorddf2 \
-   _arm_fixdfsi _arm_fixunsdfsi \
-   _arm_truncdfsf2 _arm_negsf2 _arm_addsubsf3 _arm_muldivsf3 \
-   _arm_cmpsf2 _arm_unordsf2 _arm_fixsfsi _arm_fixunssfsi \
-   _arm_floatdidf _arm_floatdisf _arm_floatundidf _arm_floatundisf \
-   _clzsi2 _clzdi2 _ctzsi2
+# This pulls in the available assembly function implementations.
+# The soft-fp code is only built for ARMv6M, since there is no
+# assembly implementation here for double-precision values.
+
+
+# Group 1: Integer function objects.
+LIB1ASMFUNCS += \
+   _ashldi3 \
+   _ashrdi3 \
+   _lshrdi3 \
+   _clzdi2 \
+   _clzsi2 \
+   _ctzsi2 \
+   _dvmd_tls \
+   _divsi3 \
+   _modsi3 \
+   _udivsi3 \
+   _umodsi3 \
+
+
+# Group 2: Single precision floating point function objects.
+LIB1ASMFUNCS += \
+   _arm_addsubsf3 \
+   _arm_cmpsf2 \
+   _arm_fixsfsi \
+   _arm_fixunssfsi \
+   _arm_floatdisf \
+   _arm_floatundisf \
+   _arm_muldivsf3 \
+   _arm_negsf2 \
+   _arm_unordsf2 \
+
+
+# Group 3: Double precision floating point function objects.
+LIB1ASMFUNCS += \
+   _arm_addsubdf3 \
+   _arm_cmpdf2 \
+   _arm_fixdfsi \
+   _arm_fixunsdfsi \
+   _arm_floatdidf \
+   _arm_floatundidf \
+   _arm_muldivdf3 \
+   _arm_negdf2 \
+   _arm_truncdfsf2 \
+   _arm_unorddf2 \
+
+
+# Group 4: Miscellaneous function objects.
+LIB1ASMFUNCS += \
+   _bb_init_func \
+   _call_via_rX \
+   _interwork_call_via_rX \
+
 
 # Currently there is a bug somewhere in GCC's alias analysis
 # or scheduling code that is breaking _fpmul_parts in fp-bit.c.
-- 
2.25.1

[PATCH v6 05/34] Add the __HAVE_FEATURE_IT and IT() macros

2021-12-27 Thread Daniel Engel

These macros complement and extend the existing do_it() macro.
Together, they streamline the process of optimizing short branchless
contitional sequences to support ARM, Thumb-2, and Thumb-1.

The inherent architecture limitations of Thumb-1 means that writing
assembly code is somewhat more tedious.  And, while such code will run
unmodified in an ARM or Thumb-2 enfironment, it will lack one of the
key performance optimizations available there.

Initially, the first idea might be to split the an instruction sequence
with #ifdef(s): one path for Thumb-1 and the other for ARM/Thumb-2.
This could suffice if conditional execution optimizations were rare.

However, #ifdef(s) break flow of an algorithm and shift focus to the
architectural differences instead of the similarities.  On functions
with a high percentage of conditional execution, it starts to become
attractive to split everything into distinct architecture-specific
function objects -- even when the underlying algorithm is identical.

Additionally, duplicated code and comments (whether an individual
operand, a line, or a larger block) become a future maintenance
liability if the two versions aren't kept in sync.

See code comments for limitations and expecated usage.

gcc/libgcc/ChangeLog:
2021-01-14 Daniel Engel 

(__HAVE_FEATURE_IT, IT): New macros.
---
 libgcc/config/arm/lib1funcs.S | 68 +++
 1 file changed, 68 insertions(+)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index b8693be8e4f..1233b8c0992 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -230,6 +230,7 @@ LSYM(Lend_fde):
ARM and Thumb-2.  However this is only supported by recent gas, so define
a set of macros to allow ARM code on older assemblers.  */
 #if defined(__thumb2__)
+#define __HAVE_FEATURE_IT
 .macro do_it cond, suffix=""
it\suffix   \cond
 .endm
@@ -245,6 +246,9 @@ LSYM(Lend_fde):
\name \dest, \src1, \tmp
 .endm
 #else
+#if !defined(__thumb__)
+#define __HAVE_FEATURE_IT
+#endif
 .macro do_it cond, suffix=""
 .endm
 .macro shift1 op, arg0, arg1, arg2
@@ -259,6 +263,70 @@ LSYM(Lend_fde):
 
 #define COND(op1, op2, cond) op1 ## op2 ## cond
 
+
+/* The IT() macro streamlines the construction of short branchless contitional
+sequences that support ARM, Thumb-2, and Thumb-1.  It is intended as an
+extension to the .do_it macro defined above.  Code not written with the
+intent to support Thumb-1 need not use IT().
+
+   IT()'s main advantage is the minimization of syntax differences.  Unified
+functions can support Thumb-1 without imposiing an undue performance
+penalty on ARM and Thumb-2.  Writing code without duplicate instructions
+and operands keeps the high level function flow clearer and should reduce
+the incidence of maintenance bugs.
+
+   Where conditional execution is supported by ARM and Thumb-2, the specified
+instruction compiles with the conditional suffix 'c'.
+
+   Where Thumb-1 and v6m do not support IT, the given instruction compiles
+with the standard unified syntax suffix "s", and a preceding branch
+instruction is required to implement conditional behavior.
+
+   (Aside: The Thumb-1 "s"-suffix pattern is somewhat simplistic, since it
+does not support 'cmp' or 'tst' with a non-"s" suffix.  It also appends
+"s" to 'mov' and 'add' with high register operands which are otherwise
+legal on v6m.  Use of IT() will result in a compiler error for all of
+these exceptional cases, and a full #ifdef code split will be required.
+However, it is unlikely that code written with Thumb-1 compatibility
+in mind will use such patterns, so IT() still promises a good value.)
+
+   Typical if/then/else usage is:
+
+#ifdef __HAVE_FEATURE_IT
+// ARM and Thumb-2 'true' condition.
+do_it   c,  tee
+#else
+// Thumb-1 'false' condition.  This must be opposite the
+//  sense of the ARM and Thumb-2 condition, since the
+//  branch is taken to skip the 'true' instruction block.
+b!c else_label
+#endif
+
+// Conditional 'true' execution for all compile modes.
+ IT(ins1,c) op1,op2
+ IT(ins2,c) op1,op2
+
+#ifndef __HAVE_FEATURE_IT
+// Thumb-1 branch to skip the 'else' instruction block.
+// Omitted for if/then usage.
+b   end_label
+#endif
+
+   else_label:
+// Conditional 'false' execution for all compile modes.
+// Omitted for if/then usage.
+ IT(ins3,!c) op1,   op2
+ IT(ins4,!c) op1,   op2
+
+   end_label:
+// Unconditional execution resumes here.
+ */
+#ifdef __HAVE_FEATURE_IT
+  #define IT(ins,c) ins##c
+#else
+  #define IT(ins,c) ins##s
+#endif
+
 #ifdef __ARM_EABI__
 .macro ARM_LDIV0 name signed
cmp r0, #0
-- 
2.25.1

[PATCH v6 06/34] Refactor 'clz' functions into a new file

2021-12-27 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/lib1funcs.S (__clzsi2i, __clzdi2): Moved to ...
* config/arm/clz2.S: New file.
---
 libgcc/config/arm/clz2.S  | 145 ++
 libgcc/config/arm/lib1funcs.S | 123 +---
 2 files changed, 146 insertions(+), 122 deletions(-)
 create mode 100644 libgcc/config/arm/clz2.S

diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
new file mode 100644
index 000..2ad9a81892c
--- /dev/null
+++ b/libgcc/config/arm/clz2.S
@@ -0,0 +1,145 @@
+/* Copyright (C) 1995-2021 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_clzsi2
+#ifdef NOT_ISA_TARGET_32BIT
+FUNC_START clzsi2
+   movsr1, #28
+   movsr3, #1
+   lslsr3, r3, #16
+   cmp r0, r3 /* 0x1 */
+   bcc 2f
+   lsrsr0, r0, #16
+   subsr1, r1, #16
+2: lsrsr3, r3, #8
+   cmp r0, r3 /* #0x100 */
+   bcc 2f
+   lsrsr0, r0, #8
+   subsr1, r1, #8
+2: lsrsr3, r3, #4
+   cmp r0, r3 /* #0x10 */
+   bcc 2f
+   lsrsr0, r0, #4
+   subsr1, r1, #4
+2: adr r2, 1f
+   ldrbr0, [r2, r0]
+   addsr0, r0, r1
+   bx lr
+.align 2
+1:
+.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
+   FUNC_END clzsi2
+#else
+ARM_FUNC_START clzsi2
+# if defined (__ARM_FEATURE_CLZ)
+   clz r0, r0
+   RET
+# else
+   mov r1, #28
+   cmp r0, #0x1
+   do_it   cs, t
+   movcs   r0, r0, lsr #16
+   subcs   r1, r1, #16
+   cmp r0, #0x100
+   do_it   cs, t
+   movcs   r0, r0, lsr #8
+   subcs   r1, r1, #8
+   cmp r0, #0x10
+   do_it   cs, t
+   movcs   r0, r0, lsr #4
+   subcs   r1, r1, #4
+   adr r2, 1f
+   ldrbr0, [r2, r0]
+   add r0, r0, r1
+   RET
+.align 2
+1:
+.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
+# endif /* !defined (__ARM_FEATURE_CLZ) */
+   FUNC_END clzsi2
+#endif
+#endif /* L_clzsi2 */
+
+#ifdef L_clzdi2
+#if !defined (__ARM_FEATURE_CLZ)
+
+# ifdef NOT_ISA_TARGET_32BIT
+FUNC_START clzdi2
+   push{r4, lr}
+   cmp xxh, #0
+   bne 1f
+#  ifdef __ARMEB__
+   movsr0, xxl
+   bl  __clzsi2
+   addsr0, r0, #32
+   b 2f
+1:
+   bl  __clzsi2
+#  else
+   bl  __clzsi2
+   addsr0, r0, #32
+   b 2f
+1:
+   movsr0, xxh
+   bl  __clzsi2
+#  endif
+2:
+   pop {r4, pc}
+# else /* NOT_ISA_TARGET_32BIT */
+ARM_FUNC_START clzdi2
+   do_push {r4, lr}
+   cmp xxh, #0
+   bne 1f
+#  ifdef __ARMEB__
+   mov r0, xxl
+   bl  __clzsi2
+   add r0, r0, #32
+   b 2f
+1:
+   bl  __clzsi2
+#  else
+   bl  __clzsi2
+   add r0, r0, #32
+   b 2f
+1:
+   mov r0, xxh
+   bl  __clzsi2
+#  endif
+2:
+   RETLDM  r4
+   FUNC_END clzdi2
+# endif /* NOT_ISA_TARGET_32BIT */
+
+#else /* defined (__ARM_FEATURE_CLZ) */
+
+ARM_FUNC_START clzdi2
+   cmp xxh, #0
+   do_it   eq, et
+   clzeq   r0, xxl
+   clzne   r0, xxh
+   addeq   r0, r0, #32
+   RET
+   FUNC_END clzdi2
+
+#endif
+#endif /* L_clzdi2 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 1233b8c0992..d92f73ba0c9 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1803,128 +1803,7 @@ LSYM(Lover12):
 
 #endif /* __symbian__ */
 
-#ifdef L_clzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzsi2
-   movsr1, #28
-   movsr3, #1
-   lslsr3, r3, #16
-   cmp r0, r3 /* 0x1 */
-   bcc 2f
-   lsrsr0, r0, #16
-   subsr1, r1, #16
-2: lsrsr3, r3, #8
-   cmp r0, r3 /* #0x100 */
-   bcc 2f
-   lsrsr0, r0, #8
-   subsr1, r1, #8
-2: lsrsr3, r3, #4
-   cmp r0, r3 /* #0x10 */
-   bcc 2f
-   lsrs

[PATCH v6 07/34] Refactor 'ctz' functions into a new file

2021-12-27 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/lib1funcs.S (__ctzsi2): Moved to ...
* config/arm/ctz2.S: New file.
---
 libgcc/config/arm/ctz2.S  | 86 +++
 libgcc/config/arm/lib1funcs.S | 65 +-
 2 files changed, 87 insertions(+), 64 deletions(-)
 create mode 100644 libgcc/config/arm/ctz2.S

diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
new file mode 100644
index 000..8702c9afb94
--- /dev/null
+++ b/libgcc/config/arm/ctz2.S
@@ -0,0 +1,86 @@
+/* Copyright (C) 1995-2021 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_ctzsi2
+#ifdef NOT_ISA_TARGET_32BIT
+FUNC_START ctzsi2
+   negsr1, r0
+   andsr0, r0, r1
+   movsr1, #28
+   movsr3, #1
+   lslsr3, r3, #16
+   cmp r0, r3 /* 0x1 */
+   bcc 2f
+   lsrsr0, r0, #16
+   subsr1, r1, #16
+2: lsrsr3, r3, #8
+   cmp r0, r3 /* #0x100 */
+   bcc 2f
+   lsrsr0, r0, #8
+   subsr1, r1, #8
+2: lsrsr3, r3, #4
+   cmp r0, r3 /* #0x10 */
+   bcc 2f
+   lsrsr0, r0, #4
+   subsr1, r1, #4
+2: adr r2, 1f
+   ldrbr0, [r2, r0]
+   subsr0, r0, r1
+   bx lr
+.align 2
+1:
+.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
+   FUNC_END ctzsi2
+#else
+ARM_FUNC_START ctzsi2
+   rsb r1, r0, #0
+   and r0, r0, r1
+# if defined (__ARM_FEATURE_CLZ)
+   clz r0, r0
+   rsb r0, r0, #31
+   RET
+# else
+   mov r1, #28
+   cmp r0, #0x1
+   do_it   cs, t
+   movcs   r0, r0, lsr #16
+   subcs   r1, r1, #16
+   cmp r0, #0x100
+   do_it   cs, t
+   movcs   r0, r0, lsr #8
+   subcs   r1, r1, #8
+   cmp r0, #0x10
+   do_it   cs, t
+   movcs   r0, r0, lsr #4
+   subcs   r1, r1, #4
+   adr r2, 1f
+   ldrbr0, [r2, r0]
+   sub r0, r0, r1
+   RET
+.align 2
+1:
+.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
+# endif /* !defined (__ARM_FEATURE_CLZ) */
+   FUNC_END ctzsi2
+#endif
+#endif /* L_clzsi2 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index d92f73ba0c9..b1df00ac597 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1804,70 +1804,7 @@ LSYM(Lover12):
 #endif /* __symbian__ */
 
 #include "clz2.S"
-
-#ifdef L_ctzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START ctzsi2
-   negsr1, r0
-   andsr0, r0, r1
-   movsr1, #28
-   movsr3, #1
-   lslsr3, r3, #16
-   cmp r0, r3 /* 0x1 */
-   bcc 2f
-   lsrsr0, r0, #16
-   subsr1, r1, #16
-2: lsrsr3, r3, #8
-   cmp r0, r3 /* #0x100 */
-   bcc 2f
-   lsrsr0, r0, #8
-   subsr1, r1, #8
-2: lsrsr3, r3, #4
-   cmp r0, r3 /* #0x10 */
-   bcc 2f
-   lsrsr0, r0, #4
-   subsr1, r1, #4
-2: adr r2, 1f
-   ldrbr0, [r2, r0]
-   subsr0, r0, r1
-   bx lr
-.align 2
-1:
-.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-   FUNC_END ctzsi2
-#else
-ARM_FUNC_START ctzsi2
-   rsb r1, r0, #0
-   and r0, r0, r1
-# if defined (__ARM_FEATURE_CLZ)
-   clz r0, r0
-   rsb r0, r0, #31
-   RET
-# else
-   mov r1, #28
-   cmp r0, #0x1
-   do_it   cs, t
-   movcs   r0, r0, lsr #16
-   subcs   r1, r1, #16
-   cmp r0, #0x100
-   do_it   cs, t
-   movcs   r0, r0, lsr #8
-   subcs   r1, r1, #8
-   cmp r0, #0x10
-   do_it   cs, t
-   movcs   r0, r0, lsr #4
-   subcs   r1, r1, #4
-   adr r2, 1f
-   ldrbr0, [r2, r0]
-   sub r0, r0, r1
-   RET
-.align 2
-1:
-.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-# endif /* !defined (_

[PATCH v6 08/34] Refactor 64-bit shift functions into a new file

2021-12-27 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/lib1funcs.S (__ashldi3, __ashrdi3, __lshldi3): Moved to ...
* config/arm/eabi/lshift.S: New file.
---
 libgcc/config/arm/eabi/lshift.S | 123 
 libgcc/config/arm/lib1funcs.S   | 103 +-
 2 files changed, 124 insertions(+), 102 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/lshift.S

diff --git a/libgcc/config/arm/eabi/lshift.S b/libgcc/config/arm/eabi/lshift.S
new file mode 100644
index 000..0974a72c377
--- /dev/null
+++ b/libgcc/config/arm/eabi/lshift.S
@@ -0,0 +1,123 @@
+/* Copyright (C) 1995-2021 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_lshrdi3
+
+   FUNC_START lshrdi3
+   FUNC_ALIAS aeabi_llsr lshrdi3
+   
+#ifdef __thumb__
+   lsrsal, r2
+   movsr3, ah
+   lsrsah, r2
+   mov ip, r3
+   subsr2, #32
+   lsrsr3, r2
+   orrsal, r3
+   negsr2, r2
+   mov r3, ip
+   lslsr3, r2
+   orrsal, r3
+   RET
+#else
+   subsr3, r2, #32
+   rsb ip, r2, #32
+   movmi   al, al, lsr r2
+   movpl   al, ah, lsr r3
+   orrmi   al, al, ah, lsl ip
+   mov ah, ah, lsr r2
+   RET
+#endif
+   FUNC_END aeabi_llsr
+   FUNC_END lshrdi3
+
+#endif
+   
+#ifdef L_ashrdi3
+   
+   FUNC_START ashrdi3
+   FUNC_ALIAS aeabi_lasr ashrdi3
+   
+#ifdef __thumb__
+   lsrsal, r2
+   movsr3, ah
+   asrsah, r2
+   subsr2, #32
+   @ If r2 is negative at this point the following step would OR
+   @ the sign bit into all of AL.  That's not what we want...
+   bmi 1f
+   mov ip, r3
+   asrsr3, r2
+   orrsal, r3
+   mov r3, ip
+1:
+   negsr2, r2
+   lslsr3, r2
+   orrsal, r3
+   RET
+#else
+   subsr3, r2, #32
+   rsb ip, r2, #32
+   movmi   al, al, lsr r2
+   movpl   al, ah, asr r3
+   orrmi   al, al, ah, lsl ip
+   mov ah, ah, asr r2
+   RET
+#endif
+
+   FUNC_END aeabi_lasr
+   FUNC_END ashrdi3
+
+#endif
+
+#ifdef L_ashldi3
+
+   FUNC_START ashldi3
+   FUNC_ALIAS aeabi_llsl ashldi3
+   
+#ifdef __thumb__
+   lslsah, r2
+   movsr3, al
+   lslsal, r2
+   mov ip, r3
+   subsr2, #32
+   lslsr3, r2
+   orrsah, r3
+   negsr2, r2
+   mov r3, ip
+   lsrsr3, r2
+   orrsah, r3
+   RET
+#else
+   subsr3, r2, #32
+   rsb ip, r2, #32
+   movmi   ah, ah, lsl r2
+   movpl   ah, al, lsl r3
+   orrmi   ah, ah, al, lsr ip
+   mov al, al, lsl r2
+   RET
+#endif
+   FUNC_END aeabi_llsl
+   FUNC_END ashldi3
+
+#endif
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index b1df00ac597..7ac50230725 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1699,108 +1699,7 @@ LSYM(Lover12):
 
 /* Prevent __aeabi double-word shifts from being produced on SymbianOS.  */
 #ifndef __symbian__
-
-#ifdef L_lshrdi3
-
-   FUNC_START lshrdi3
-   FUNC_ALIAS aeabi_llsr lshrdi3
-   
-#ifdef __thumb__
-   lsrsal, r2
-   movsr3, ah
-   lsrsah, r2
-   mov ip, r3
-   subsr2, #32
-   lsrsr3, r2
-   orrsal, r3
-   negsr2, r2
-   mov r3, ip
-   lslsr3, r2
-   orrsal, r3
-   RET
-#else
-   subsr3, r2, #32
-   rsb ip, r2, #32
-   movmi   al, al, lsr r2
-   movpl   al, ah, lsr r3
-   orrmi   al, al, ah, lsl ip
-   mov ah, ah, lsr r2
-   RET
-#endif
-   FUNC_END aeabi_llsr
-   FUNC_END lshrdi3
-
-#endif
-   
-#ifdef L_ashrdi3
-   
-   FUNC_START ashrdi3
-   FUNC_ALIAS aeabi_lasr ashrdi3
-   
-#ifdef __thumb__
-   lsrsal, r2
-   movsr3, ah
-   asrsah, r2
-   subs

[PATCH v6 09/34] Import 'clz' functions from the CM0 library

2021-12-27 Thread Daniel Engel

On architectures without __ARM_FEATURE_CLZ, this version combines __clzdi2()
with __clzsi2() into a single object with an efficient tail call.  Also, this
version merges the formerly separate Thumb and ARM code implementations
into a unified instruction sequence.  This change significantly improves
Thumb performance without affecting ARM performance.  Finally, this version
adds a new __OPTIMIZE_SIZE__ build option (binary search loop).

There is no change to the code for architectures with __ARM_FEATURE_CLZ.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bits/clz2.S (__clzsi2, __clzdi2): Reduced code size on
architectures without __ARM_FEATURE_CLZ.
* config/arm/t-elf (LIB1ASMFUNCS): Moved _clzsi2 to new weak roup.
---
 libgcc/config/arm/clz2.S | 363 +--
 libgcc/config/arm/t-elf  |   7 +-
 2 files changed, 237 insertions(+), 133 deletions(-)

diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
index 2ad9a81892c..51ee35fbe78 100644
--- a/libgcc/config/arm/clz2.S
+++ b/libgcc/config/arm/clz2.S
@@ -1,145 +1,244 @@
-/* Copyright (C) 1995-2021 Free Software Foundation, Inc.
+/* clz2.S: Cortex M0 optimized 'clz' functions
 
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (g...@danielengel.com)
 
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
 
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
 
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+
+#ifdef L_clzdi2
+
+// int __clzdi2(long long)
+// Counts leading zero bits in $r1:$r0.
+// Returns the result in $r0.
+FUNC_START_SECTION clzdi2 .text.sorted.libgcc.clz2.clzdi2
+CFI_START_FUNCTION
+
+// Moved here from lib1funcs.S
+cmp xxh,#0
+do_it   eq, et
+clzeq   r0, xxl
+clzne   r0, xxh
+addeq   r0, #32
+RET
+
+CFI_END_FUNCTION
+FUNC_END clzdi2
+
+#endif /* L_clzdi2 */
 
 
 #ifdef L_clzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzsi2
-   movsr1, #28
-   movsr3, #1
-   lslsr3, r3, #16
-   cmp r0, r3 /* 0x1 */
-   bcc 2f
-   lsrsr0, r0, #16
-   subsr1, r1, #16
-2: lsrsr3, r3, #8
-   cmp r0, r3 /* #0x100 */
-   bcc 2f
-   lsrsr0, r0, #8
-   subsr1, r1, #8
-2: lsrsr3, r3, #4
-   cmp r0, r3 /* #0x10 */
-   bcc 2f
-   lsrsr0, r0, #4
-   subsr1, r1, #4
-2: adr r2, 1f
-   ldrbr0, [r2, r0]
-   addsr0, r0, r1
-   bx lr
-.align 2
-1:
-.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
-   FUNC_END clzsi2
-#else
-ARM_FUNC_START clzsi2
-# if defined (__ARM_FEATURE_CLZ)
-   clz r0, r0
-   RET
-# else
-   mov r1, #28
-   cmp r0, #0x1
-   do_it   cs, t
-   movcs   r0, r0, lsr #16
-   subcs   r1, r1, #16
-   cmp r0, #0x100
-   do_it   cs, t
-   movcs   r0, r0, lsr #8
-   subcs   r1, r1, #8
-   cmp r0, #0x10
-   do_it   cs, t
-   movcs   r0, r0, lsr #4
-   subcs   r1, r1, #4
-   adr r2, 1f
-   ldrbr0, [r2, r0]
-   add r0, r0, r1
-   RET
-.align

[PATCH v6 10/34] Import 'ctz' functions from the CM0 library

2021-12-27 Thread Daniel Engel

This version combines __ctzdi2() with __ctzsi2() into a single object with
an efficient tail call.  The former implementation of __ctzdi2() was in C.

On architectures without __ARM_FEATURE_CLZ, this version merges the formerly
separate Thumb and ARM code sequences into a unified instruction sequence.
This change significantly improves Thumb performance without affecting ARM
performance.  Finally, this version adds a new __OPTIMIZE_SIZE__ build option.

On architectures with __ARM_FEATURE_CLZ, __ctzsi2(0) now returns 32.  Formerly,
__ctzsi2(0) would return -1.  Architectures without __ARM_FEATURE_CLZ have
always returned 32, so this change makes the return value consistent.
This change costs 2 extra instructions (branchless).

Likewise on architectures with __ARM_FEATURE_CLZ,  __ctzdi2(0) now returns
64 instead of 31.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bits/ctz2.S (__ctzdi2): Added a new function.
(__clzsi2): Reduced size on architectures without __ARM_FEATURE_CLZ;
changed so __clzsi2(0)=32 on architectures wtih __ARM_FEATURE_CLZ.
* config/arm/t-elf (LIB1ASMFUNCS): Added _ctzdi2;
moved _ctzsi2 to the weak function objects group.
---
 libgcc/config/arm/ctz2.S | 308 +--
 libgcc/config/arm/t-elf  |   3 +-
 2 files changed, 233 insertions(+), 78 deletions(-)

diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
index 8702c9afb94..dc436af3571 100644
--- a/libgcc/config/arm/ctz2.S
+++ b/libgcc/config/arm/ctz2.S
@@ -1,86 +1,240 @@
-/* Copyright (C) 1995-2021 Free Software Foundation, Inc.
+/* ctz2.S: ARM optimized 'ctz' functions
 
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
+   Copyright (C) 2020-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (g...@danielengel.com)
 
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
 
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
 
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
 
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
 
-#ifdef L_ctzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START ctzsi2
-   negsr1, r0
-   andsr0, r0, r1
-   movsr1, #28
-   movsr3, #1
-   lslsr3, r3, #16
-   cmp r0, r3 /* 0x1 */
-   bcc 2f
-   lsrsr0, r0, #16
-   subsr1, r1, #16
-2: lsrsr3, r3, #8
-   cmp r0, r3 /* #0x100 */
-   bcc 2f
-   lsrsr0, r0, #8
-   subsr1, r1, #8
-2: lsrsr3, r3, #4
-   cmp r0, r3 /* #0x10 */
-   bcc 2f
-   lsrsr0, r0, #4
-   subsr1, r1, #4
-2: adr r2, 1f
-   ldrbr0, [r2, r0]
-   subsr0, r0, r1
-   bx lr
-.align 2
-1:
-.byte  27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-   FUNC_END ctzsi2
+
+// When the hardware 'ctz' function is available, an efficient version
+//  of __ctzsi2(x) can be created by calculating '31 - __ctzsi2(lsb(x))',
+//  where lsb(x) is 'x' with only the least-significant '1' bit set.
+// The following offset applies to all of the functions in this file.
+#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+  #define CTZ_RESULT_OFFSET 1
 #else
-ARM_FUNC_START ctzsi2
-   rsb r1, r0, #0
-   and r0, r0, r1
-# if defined (__ARM_FEATURE_CLZ)
-   clz r0, r0
-   rsb

[PATCH v6 11/34] Import 64-bit shift functions from the CM0 library

2021-12-27 Thread Daniel Engel

The Thumb versions of these functions are each 1-2 instructions smaller
and faster, and branchless when the IT instruction is available.

The ARM versions were converted to the "xxl/xxh" big-endian register
naming convention, but are otherwise unchanged.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bits/shift.S (__ashldi3, __ashrdi3, __lshldi3):
Reduced code size on Thumb architectures;
updated big-endian register naming convention to "xxl/xxh".
---
 libgcc/config/arm/eabi/lshift.S | 338 +---
 1 file changed, 228 insertions(+), 110 deletions(-)

diff --git a/libgcc/config/arm/eabi/lshift.S b/libgcc/config/arm/eabi/lshift.S
index 0974a72c377..16cf2dcef04 100644
--- a/libgcc/config/arm/eabi/lshift.S
+++ b/libgcc/config/arm/eabi/lshift.S
@@ -1,123 +1,241 @@
-/* Copyright (C) 1995-2021 Free Software Foundation, Inc.
+/* lshift.S: ARM optimized 64-bit integer shift
 
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
 
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
 
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
 
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
 
 
 #ifdef L_lshrdi3
 
-   FUNC_START lshrdi3
-   FUNC_ALIAS aeabi_llsr lshrdi3
-   
-#ifdef __thumb__
-   lsrsal, r2
-   movsr3, ah
-   lsrsah, r2
-   mov ip, r3
-   subsr2, #32
-   lsrsr3, r2
-   orrsal, r3
-   negsr2, r2
-   mov r3, ip
-   lslsr3, r2
-   orrsal, r3
-   RET
-#else
-   subsr3, r2, #32
-   rsb ip, r2, #32
-   movmi   al, al, lsr r2
-   movpl   al, ah, lsr r3
-   orrmi   al, al, ah, lsl ip
-   mov ah, ah, lsr r2
-   RET
-#endif
-   FUNC_END aeabi_llsr
-   FUNC_END lshrdi3
-
-#endif
-   
+// long long __aeabi_llsr(long long, int)
+// Logical shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+FUNC_START_SECTION aeabi_llsr .text.sorted.libgcc.lshrdi3
+FUNC_ALIAS lshrdi3 aeabi_llsr
+CFI_START_FUNCTION
+
+  #if defined(__thumb__) && __thumb__
+
+// Save a copy for the remainder.
+movsr3, xxh
+
+// Assume a simple shift.
+lsrsxxl,r2
+lsrsxxh,r2
+
+// Test if the shift distance is larger than 1 word.
+subsr2, #32
+
+#ifdef __HAVE_FEATURE_IT
+do_it   lo,te
+
+// The remainder is opposite the main shift, (32 - x) bits.
+rsblo   r2, #0
+lsllo   r3, r2
+
+// The remainder shift extends into the hi word.
+lsrhs   r3, r2
+
+#else /* !__HAVE_FEATURE_IT */
+bhs LLSYM(__llsr_large)
+
+// The remainder is opposite the main shift, (32 - x) bits.
+rsbsr2, #0
+lslsr3, r2
+
+// Cancel any remaining shift.
+eorsr2, r2
+
+  LLSYM(__llsr_large):
+// Apply any remaining shift to the hi word.
+lsrsr3, r2
+
+#endif /* !__HAVE_FEATURE_IT */
+
+// Merge remainder and

[PATCH v6 12/34] Import 'clrsb' functions from the CM0 library

2021-12-27 Thread Daniel Engel

This implementation provides an efficient tail call to __clzsi2(), making the
functions rather smaller and faster than the C versions.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bits/clz2.S (__clrsbsi2, __clrsbdi2):
Added new functions.
* config/arm/t-elf (LIB1ASMFUNCS):
Added new function objects _clrsbsi2 and _clrsbdi2).
---
 libgcc/config/arm/clz2.S | 108 ++-
 libgcc/config/arm/t-elf  |   2 +
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
index 51ee35fbe78..a2de45ff651 100644
--- a/libgcc/config/arm/clz2.S
+++ b/libgcc/config/arm/clz2.S
@@ -1,4 +1,4 @@
-/* clz2.S: Cortex M0 optimized 'clz' functions
+/* clz2.S: ARM optimized 'clz' and related functions
 
Copyright (C) 2018-2021 Free Software Foundation, Inc.
Contributed by Daniel Engel (g...@danielengel.com)
@@ -23,7 +23,7 @@
<http://www.gnu.org/licenses/>.  */
 
 
-#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+#ifdef __ARM_FEATURE_CLZ
 
 #ifdef L_clzdi2
 
@@ -242,3 +242,107 @@ FUNC_END clzdi2
 
 #endif /* !__ARM_FEATURE_CLZ */
 
+
+#ifdef L_clrsbdi2
+
+// int __clrsbdi2(int)
+// Counts the number of "redundant sign bits" in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and $r3 as scratch space.
+FUNC_START_SECTION clrsbdi2 .text.sorted.libgcc.clz2.clrsbdi2
+CFI_START_FUNCTION
+
+  #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+// Invert negative signs to keep counting zeros.
+asrsr3, xxh,#31
+eorsxxl,r3
+eorsxxh,r3
+
+// Same as __clzdi2(), except that the 'C' flag is pre-calculated.
+// Also, the trailing 'subs', since the last bit is not redundant.
+do_it   eq, et
+clzeq   r0, xxl
+clzne   r0, xxh
+addeq   r0, #32
+subsr0, #1
+RET
+
+  #else  /* !__ARM_FEATURE_CLZ */
+// Result if all the bits in the argument are zero.
+// Set it here to keep the flags clean after 'eors' below.
+movsr2, #31
+
+// Invert negative signs to keep counting zeros.
+asrsr3, xxh,#31
+eorsxxh,r3
+
+#if defined(__ARMEB__) && __ARMEB__
+// If the upper word is non-zero, return '__clzsi2(upper) - 1'.
+bne SYM(__internal_clzsi2)
+
+// The upper word is zero, prepare the lower word.
+movsr0, r1
+eorsr0, r3
+
+#else /* !__ARMEB__ */
+// Save the lower word temporarily.
+// This somewhat awkward construction adds one cycle when the
+//  branch is not taken, but prevents a double-branch.
+eorsr3, r0
+
+// If the upper word is non-zero, return '__clzsi2(upper) - 1'.
+movsr0, r1
+bneSYM(__internal_clzsi2)
+
+// Restore the lower word.
+movsr0, r3
+
+#endif /* !__ARMEB__ */
+
+// The upper word is zero, return '31 + __clzsi2(lower)'.
+addsr2, #32
+b   SYM(__internal_clzsi2)
+
+  #endif /* !__ARM_FEATURE_CLZ */
+
+CFI_END_FUNCTION
+FUNC_END clrsbdi2
+
+#endif /* L_clrsbdi2 */
+
+
+#ifdef L_clrsbsi2
+
+// int __clrsbsi2(int)
+// Counts the number of "redundant sign bits" in $r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+FUNC_START_SECTION clrsbsi2 .text.sorted.libgcc.clz2.clrsbsi2
+CFI_START_FUNCTION
+
+// Invert negative signs to keep counting zeros.
+asrsr2, r0,#31
+eorsr0, r2
+
+  #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+// Count.
+clz r0, r0
+
+// The result for a positive value will always be >= 1.
+// By definition, the last bit is not redundant.
+subsr0, #1
+RET
+
+  #else /* !__ARM_FEATURE_CLZ */
+// Result if all the bits in the argument are zero.
+// By definition, the last bit is not redundant.
+movsr2, #31
+b   SYM(__internal_clzsi2)
+
+  #endif  /* !__ARM_FEATURE_CLZ */
+
+CFI_END_FUNCTION
+FUNC_END clrsbsi2
+
+#endif /* L_clrsbsi2 */
+
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 33b83ac4adf..89071cebe45 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -31,6 +31,8 @@ LIB1ASMFUNCS += \
_ashldi3 \
_ashrdi3 \
_lshrdi3 \
+   _clrsbsi2 \
+   _clrsbdi2 \
_clzdi2 \
_ctzdi2 \
_dvmd_tls \
-- 
2.25.1

[PATCH v6 13/34] Import 'ffs' functions from the CM0 library

2021-12-27 Thread Daniel Engel

This implementation provides an efficient tail call to __clzdi2(), making the
functions rather smaller and faster than the C versions.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bits/ctz2.S (__ffssi2, __ffsdi2): New functions.
* config/arm/t-elf (LIB1ASMFUNCS): Added _ffssi2 and _ffsdi2.
---
 libgcc/config/arm/ctz2.S | 77 +++-
 libgcc/config/arm/t-elf  |  2 ++
 2 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
index dc436af3571..b9528a061a2 100644
--- a/libgcc/config/arm/ctz2.S
+++ b/libgcc/config/arm/ctz2.S
@@ -1,4 +1,4 @@
-/* ctz2.S: ARM optimized 'ctz' functions
+/* ctz2.S: ARM optimized 'ctz' and related functions
 
Copyright (C) 2020-2021 Free Software Foundation, Inc.
Contributed by Daniel Engel (g...@danielengel.com)
@@ -238,3 +238,78 @@ FUNC_END ctzdi2
 
 #endif /* L_ctzsi2 || L_ctzdi2 */
 
+
+#ifdef L_ffsdi2
+
+// int __ffsdi2(int)
+// Return the index of the least significant 1-bit in $r1:r0,
+//  or zero if $r1:r0 is zero.  The least significant bit is index 1.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+// Same section as __ctzsi2() for sake of the tail call branches.
+FUNC_START_SECTION ffsdi2 .text.sorted.libgcc.ctz2.ffsdi2
+CFI_START_FUNCTION
+
+// Simplify branching by assuming a non-zero lower word.
+// For all such, ffssi2(x) == ctzsi2(x) + 1.
+movsr2,#(33 - CTZ_RESULT_OFFSET)
+
+  #if defined(__ARMEB__) && __ARMEB__
+// HACK: Save the upper word in a scratch register.
+movsr3, r0
+
+// Test the lower word.
+movsr0, r1
+bne SYM(__internal_ctzsi2)
+
+// Test the upper word.
+movsr2,#(65 - CTZ_RESULT_OFFSET)
+movsr0, r3
+bne SYM(__internal_ctzsi2)
+
+  #else /* !__ARMEB__ */
+// Test the lower word.
+cmp r0, #0
+bne SYM(__internal_ctzsi2)
+
+// Test the upper word.
+movsr2,#(65 - CTZ_RESULT_OFFSET)
+movsr0, r1
+bne SYM(__internal_ctzsi2)
+
+  #endif /* !__ARMEB__ */
+
+// Upper and lower words are both zero.
+RET
+
+CFI_END_FUNCTION
+FUNC_END ffsdi2
+
+#endif /* L_ffsdi2 */
+
+
+#ifdef L_ffssi2
+
+// int __ffssi2(int)
+// Return the index of the least significant 1-bit in $r0,
+//  or zero if $r0 is zero.  The least significant bit is index 1.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+// Same section as __ctzsi2() for sake of the tail call branches.
+FUNC_START_SECTION ffssi2 .text.sorted.libgcc.ctz2.ffssi2
+CFI_START_FUNCTION
+
+// Simplify branching by assuming a non-zero argument.
+// For all such, ffssi2(x) == ctzsi2(x) + 1.
+movsr2,#(33 - CTZ_RESULT_OFFSET)
+
+// Test for zero, return unmodified.
+cmp r0, #0
+bne SYM(__internal_ctzsi2)
+RET
+
+CFI_END_FUNCTION
+FUNC_END ffssi2
+
+#endif /* L_ffssi2 */
+
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 89071cebe45..346fc766f17 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -35,6 +35,8 @@ LIB1ASMFUNCS += \
_clrsbdi2 \
_clzdi2 \
_ctzdi2 \
+   _ffssi2 \
+   _ffsdi2 \
_dvmd_tls \
_divsi3 \
_modsi3 \
-- 
2.25.1

[PATCH v6 14/34] Import 'parity' functions from the CM0 library

2021-12-27 Thread Daniel Engel

The functional overlap between the single- and double-word functions makes
functions makes this implementation about half the size of the C functions
if both functions are linked in the same application.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/parity.S: New file for __paritysi2/di2().
* config/arm/lib1funcs.S: #include bit/parity.S
* config/arm/t-elf (LIB1ASMFUNCS): Added _paritysi2/di2.
---
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/parity.S| 120 ++
 libgcc/config/arm/t-elf   |   2 +
 3 files changed, 123 insertions(+)
 create mode 100644 libgcc/config/arm/parity.S

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 7ac50230725..600ea2dfdc9 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1704,6 +1704,7 @@ LSYM(Lover12):
 
 #include "clz2.S"
 #include "ctz2.S"
+#include "parity.S"
 
 /*  */
 /* These next two sections are here despite the fact that they contain Thumb 
diff --git a/libgcc/config/arm/parity.S b/libgcc/config/arm/parity.S
new file mode 100644
index 000..45233bc9d8f
--- /dev/null
+++ b/libgcc/config/arm/parity.S
@@ -0,0 +1,120 @@
+/* parity.S: ARM optimized parity functions
+
+   Copyright (C) 2020-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_paritydi2
+
+// int __paritydi2(int)
+// Returns '0' if the number of bits set in $r1:r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+FUNC_START_SECTION paritydi2 .text.sorted.libgcc.paritydi2
+CFI_START_FUNCTION
+
+// Combine the upper and lower words, then fall through.
+// Byte-endianness does not matter for this function.
+eorsr0, r1
+
+#endif /* L_paritydi2 */
+
+
+// The implementation of __paritydi2() tightly couples with __paritysi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __paritydi2() when only using __paritysi2().
+// Therefore, this block configures __paritysi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __paritydi2().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_paritysi2' should appear before '_paritydi2' in LIB1ASMFUNCS.
+#if defined(L_paritysi2) || defined(L_paritydi2)
+
+#ifdef L_paritysi2
+// int __paritysi2(int)
+// Returns '0' if the number of bits set in $r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+// Uses $r2 as scratch space.
+WEAK_START_SECTION paritysi2 .text.sorted.libgcc.paritysi2
+CFI_START_FUNCTION
+
+#else /* L_paritydi2 */
+FUNC_ENTRY paritysi2
+
+#endif
+
+  #if defined(__thumb__) && __thumb__
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+
+// Size optimized: 16 bytes, 40 cycles
+// Speed optimized: 24 bytes, 14 cycles
+movsr2, #16
+
+LLSYM(__parity_loop):
+// Calculate the parity of successively smaller half-words into the 
MSB.
+movsr1, r0
+lslsr1, r2
+eorsr0, r1
+lsrsr2, #1
+bne LLSYM(__parity_loop)
+
+#else /* !__OPTIMIZE_SIZE__ */
+
+// Unroll the loop.  The 'libgcc' reference C implementation replaces
+//  the x2 and the x1 shifts with a constant.  However, since it takes
+//  4 cycles to load, index, and mask the constant result, it doesn't
+//  cost anything to keep shifting (and saves a few bytes).
+lslsr1, r0, #16
+eorsr0, r1
+lslsr1, r0,

[PATCH v6 15/34] Import 'popcnt' functions from the CM0 library

2021-12-27 Thread Daniel Engel

The functional overlap between the single- and double-word functions
makes this implementation about 30% smaller than the C functions
if both functions are linked together in the same appliation.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/popcnt.S (__popcountsi, __popcountdi2): New file.
* config/arm/lib1funcs.S: #include bit/popcnt.S
* config/arm/t-elf (LIB1ASMFUNCS): Add _popcountsi2/di2.
---
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/popcnt.S| 189 ++
 libgcc/config/arm/t-elf   |   2 +
 3 files changed, 192 insertions(+)
 create mode 100644 libgcc/config/arm/popcnt.S

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 600ea2dfdc9..bd84a3e4281 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1705,6 +1705,7 @@ LSYM(Lover12):
 #include "clz2.S"
 #include "ctz2.S"
 #include "parity.S"
+#include "popcnt.S"
 
 /*  */
 /* These next two sections are here despite the fact that they contain Thumb 
diff --git a/libgcc/config/arm/popcnt.S b/libgcc/config/arm/popcnt.S
new file mode 100644
index 000..51b1ed745ee
--- /dev/null
+++ b/libgcc/config/arm/popcnt.S
@@ -0,0 +1,189 @@
+/* popcnt.S: ARM optimized popcount functions
+
+   Copyright (C) 2020-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (g...@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_popcountdi2
+
+// int __popcountdi2(int)
+// Returns the number of bits set in $r1:$r0.
+// Returns the result in $r0.
+FUNC_START_SECTION popcountdi2 .text.sorted.libgcc.popcountdi2
+CFI_START_FUNCTION
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+// Initialize the result.
+// Compensate for the two extra loop (one for each word)
+//  required to detect zero arguments.
+movsr2, #2
+
+LLSYM(__popcountd_loop):
+// Same as __popcounts_loop below, except for $r1.
+subsr2, #1
+subsr3, r1, #1
+andsr1, r3
+bcs LLSYM(__popcountd_loop)
+
+// Repeat the operation for the second word.
+b   LLSYM(__popcounts_loop)
+
+  #else /* !__OPTIMIZE_SIZE__ */
+// Load the one-bit alternating mask.
+ldr r3, =0x
+
+// Reduce the second word.
+lsrsr2, r1, #1
+andsr2, r3
+subsr1, r2
+
+// Reduce the first word.
+lsrsr2, r0, #1
+andsr2, r3
+subsr0, r2
+
+// Load the two-bit alternating mask.
+ldr r3, =0x
+
+// Reduce the second word.
+lsrsr2, r1, #2
+andsr2, r3
+andsr1, r3
+addsr1, r2
+
+// Reduce the first word.
+lsrsr2, r0, #2
+andsr2, r3
+andsr0, r3
+addsr0, r2
+
+// There will be a maximum of 8 bits in each 4-bit field.
+// Jump into the single word flow to combine and complete.
+b   LLSYM(__popcounts_merge)
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+#endif /* L_popcountdi2 */
+
+
+// The implementation of __popcountdi2() tightly couples with __popcountsi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __popcountdi2() when only using __popcountsi2().
+// Therefore, this block configures __popcountsi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __popcountdi2().  The standalone version 
must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_popcountsi2' should appear b

[PATCH v6 16/34] Refactor Thumb-1 64-bit comparison into a new file

2021-12-27 Thread Daniel Engel

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/bpabi-v6m.S (__aeabi_lcmp, __aeabi_ulcmp): Moved to ...
* config/arm/eabi/lcmp.S: New file.
* config/arm/lib1funcs.S: #include eabi/lcmp.S.
---
 libgcc/config/arm/bpabi-v6m.S | 46 --
 libgcc/config/arm/eabi/lcmp.S | 73 +++
 libgcc/config/arm/lib1funcs.S |  1 +
 3 files changed, 74 insertions(+), 46 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/lcmp.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index 069fcbbf48c..a051c1530a4 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -33,52 +33,6 @@
.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-#ifdef L_aeabi_lcmp
-
-FUNC_START aeabi_lcmp
-   cmp xxh, yyh
-   beq 1f
-   bgt 2f
-   movsr0, #1
-   negsr0, r0
-   RET
-2:
-   movsr0, #1
-   RET
-1:
-   subsr0, xxl, yyl
-   beq 1f
-   bhi 2f
-   movsr0, #1
-   negsr0, r0
-   RET
-2:
-   movsr0, #1
-1:
-   RET
-   FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-   
-#ifdef L_aeabi_ulcmp
-
-FUNC_START aeabi_ulcmp
-   cmp xxh, yyh
-   bne 1f
-   subsr0, xxl, yyl
-   beq 2f
-1:
-   bcs 1f
-   movsr0, #1
-   negsr0, r0
-   RET
-1:
-   movsr0, #1
-2:
-   RET
-   FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
 
 .macro test_div_by_zero signed
cmp yyh, #0
diff --git a/libgcc/config/arm/eabi/lcmp.S b/libgcc/config/arm/eabi/lcmp.S
new file mode 100644
index 000..336db1d398c
--- /dev/null
+++ b/libgcc/config/arm/eabi/lcmp.S
@@ -0,0 +1,73 @@
+/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
+   ARMv6-M and ARMv8-M Baseline like ISA variants.
+
+   Copyright (C) 2006-2020 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_aeabi_lcmp
+
+FUNC_START aeabi_lcmp
+cmp xxh, yyh
+beq 1f
+bgt 2f
+movsr0, #1
+negsr0, r0
+RET
+2:
+movsr0, #1
+RET
+1:
+subsr0, xxl, yyl
+beq 1f
+bhi 2f
+movsr0, #1
+negsr0, r0
+RET
+2:
+movsr0, #1
+1:
+RET
+FUNC_END aeabi_lcmp
+
+#endif /* L_aeabi_lcmp */
+
+#ifdef L_aeabi_ulcmp
+
+FUNC_START aeabi_ulcmp
+cmp xxh, yyh
+bne 1f
+subsr0, xxl, yyl
+beq 2f
+1:
+bcs 1f
+movsr0, #1
+negsr0, r0
+RET
+1:
+movsr0, #1
+2:
+RET
+FUNC_END aeabi_ulcmp
+
+#endif /* L_aeabi_ulcmp */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index bd84a3e4281..5e24d0a6749 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1991,5 +1991,6 @@ LSYM(Lchange_\register):
 #include "bpabi.S"
 #else /* NOT_ISA_TARGET_32BIT */
 #include "bpabi-v6m.S"
+#include "eabi/lcmp.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #endif /* !__symbian__ */
-- 
2.25.1

[PATCH v6 17/34] Import 64-bit comparison from CM0 library

2021-12-27 Thread Daniel Engel

These are 2-5 instructions smaller and just as fast.  Branches are
minimized, which will allow easier adaptation to Thumb-2/ARM mode.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel 

* config/arm/eabi/lcmp.S (__aeabi_lcmp, __aeabi_ulcmp): Replaced;
add macro configuration to build __cmpdi2() and __ucmpdi2().
* config/arm/t-elf (LIB1ASMFUNCS): Added _cmpdi2 and _ucmpdi2.
---
 libgcc/config/arm/eabi/lcmp.S | 151 +-
 libgcc/config/arm/t-elf   |   2 +
 2 files changed, 112 insertions(+), 41 deletions(-)

diff --git a/libgcc/config/arm/eabi/lcmp.S b/libgcc/config/arm/eabi/lcmp.S
index 336db1d398c..2ac9d178b34 100644
--- a/libgcc/config/arm/eabi/lcmp.S
+++ b/libgcc/config/arm/eabi/lcmp.S
@@ -1,8 +1,7 @@
-/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
-   ARMv6-M and ARMv8-M Baseline like ISA variants.
+/* lcmp.S: Thumb-1 optimized 64-bit integer comparison
 
-   Copyright (C) 2006-2020 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (g...@danielengel.com)
 
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
@@ -24,50 +23,120 @@
<http://www.gnu.org/licenses/>.  */
 
 
+#if defined(L_aeabi_lcmp) || defined(L_cmpdi2)
+
 #ifdef L_aeabi_lcmp
+  #define LCMP_NAME aeabi_lcmp
+  #define LCMP_SECTION .text.sorted.libgcc.lcmp
+#else
+  #define LCMP_NAME cmpdi2
+  #define LCMP_SECTION .text.sorted.libgcc.cmpdi2
+#endif
+
+// int __aeabi_lcmp(long long, long long)
+// int __cmpdi2(long long, long long)
+// Compares the 64 bit signed values in $r1:$r0 and $r3:$r2.
+// lcmp() returns $r0 = { -1, 0, +1 } for orderings { <, ==, > } respectively.
+// cmpdi2() returns $r0 = { 0, 1, 2 } for orderings { <, ==, > } respectively.
+// Object file duplication assumes typical programs follow one runtime ABI.
+FUNC_START_SECTION LCMP_NAME LCMP_SECTION
+CFI_START_FUNCTION
+
+// Calculate the difference $r1:$r0 - $r3:$r2.
+subsxxl,yyl
+sbcsxxh,yyh
+
+// With $r2 free, create a known offset value without affecting
+//  the N or Z flags.
+// BUG? The originally unified instruction for v6m was 'mov r2, r3'.
+//  However, this resulted in a compile error with -mthumb:
+//"MOV Rd, Rs with two low registers not permitted".
+// Since unified syntax deprecates the "cpy" instruction, shouldn't
+//  there be a backwards-compatible tranlation available?
+cpy r2, r3
+
+// Evaluate the comparison result.
+blt LLSYM(__lcmp_lt)
+
+// The reference offset ($r2 - $r3) will be +2 iff the first
+//  argument is larger, otherwise the offset value remains 0.
+addsr2, #2
+
+// Check for zero (equality in 64 bits).
+// It doesn't matter which register was originally "hi".
+orrsr0,r1
+
+// The result is already 0 on equality.
+beq LLSYM(__lcmp_return)
+
+LLSYM(__lcmp_lt):
+// Create +1 or -1 from the offset value defined earlier.
+addsr3, #1
+subsr0, r2, r3
+
+LLSYM(__lcmp_return):
+  #ifdef L_cmpdi2
+// Offset to the correct output specification.
+addsr0, #1
+  #endif
 
-FUNC_START aeabi_lcmp
-cmp xxh, yyh
-beq 1f
-bgt 2f
-movsr0, #1
-negsr0, r0
-RET
-2:
-movsr0, #1
-RET
-1:
-subsr0, xxl, yyl
-beq 1f
-bhi 2f
-movsr0, #1
-negsr0, r0
-RET
-2:
-movsr0, #1
-1:
 RET
-FUNC_END aeabi_lcmp
 
-#endif /* L_aeabi_lcmp */
+CFI_END_FUNCTION
+FUNC_END LCMP_NAME
+
+#endif /* L_aeabi_lcmp || L_cmpdi2 */
+
+
+#if defined(L_aeabi_ulcmp) || defined(L_ucmpdi2)
 
 #ifdef L_aeabi_ulcmp
+  #define ULCMP_NAME aeabi_ulcmp
+  #define ULCMP_SECTION .text.sorted.libgcc.ulcmp
+#else
+  #define ULCMP_NAME ucmpdi2
+  #define ULCMP_SECTION .text.sorted.libgcc.ucmpdi2
+#endif
+
+// int __aeabi_ulcmp(unsigned long long, unsigned long long)
+// int __ucmpdi2(unsigned long long, unsigned long long)
+// Compares the 64 bit unsigned values in $r1:$r0 and $r3:$r2.
+// ulcmp() returns $r0 = { -1, 0, +1 } for orderings { <, ==, > } respectively.
+// ucmpdi2() returns $r0 = { 0, 1, 2 } for orderings { <, ==, > } respectively.
+// Object file duplication assumes typical programs follow one runtime ABI.
+FUNC_START_SECTION ULCMP_NAME ULCMP_SECTION
+CFI_START_FUNCTION
+
+// Calculate the 'C' flag.
+subsxxl,yyl
+sbcsxxh,yyh
+
+// Capture the carry flg.
+// $r2 wil

1 2 >

1 - 100 of 117 matches

Mail list logo