2.24: 00_all_0030-Fix-writes-past-the-allocated-array-bounds-in-execvp.patch 00_all_0031-MIPS-Add-.insn-to-ensure-a-text-label-is-defined-as-.patch 00_all_0032-x86_64-fix-static-build-of-__memcpy_chk-for-compiler.patch 00_all_0033-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch 00_all_0034-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch 00_all_0035-alpha-fix-ceil-on-sNaN-input.patch 00_all_0036-alpha-fix-floor-on-sNaN-input.patch 00_all_0037-alpha-fix-rint-on-sNaN-input.patch 00_all_0038-alpha-fix-trunc-for-big-input-values.patch README.history

Mike Frysinger (vapier) Thu, 08 Dec 2016 11:28:59 -0800

vapier      16/12/08 19:28:42

  Modified:             README.history
  Added:               
                        
00_all_0030-Fix-writes-past-the-allocated-array-bounds-in-execvp.patch
                        
00_all_0031-MIPS-Add-.insn-to-ensure-a-text-label-is-defined-as-.patch
                        
00_all_0032-x86_64-fix-static-build-of-__memcpy_chk-for-compiler.patch
                        
00_all_0033-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch
                        
00_all_0034-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch
                        00_all_0035-alpha-fix-ceil-on-sNaN-input.patch
                        00_all_0036-alpha-fix-floor-on-sNaN-input.patch
                        00_all_0037-alpha-fix-rint-on-sNaN-input.patch
                        00_all_0038-alpha-fix-trunc-for-big-input-values.patch
  Log:
  add patches from upstream mostly for alpha #581790


Revision  Changes    Path
1.3                  src/patchsets/glibc/2.24/README.history

file : 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/README.history?rev=1.3&view=markup
plain: 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/README.history?rev=1.3&content-type=text/plain
diff : 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/README.history?r1=1.2&r2=1.3

Index: README.history
===================================================================
RCS file: /var/cvsroot/gentoo/src/patchsets/glibc/2.24/README.history,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -r1.2 -r1.3
--- README.history      15 Nov 2016 19:38:23 -0000      1.2
+++ README.history      8 Dec 2016 19:28:42 -0000       1.3
@@ -1,3 +1,14 @@
+3              08 Dec 2016
+       + 00_all_0030-Fix-writes-past-the-allocated-array-bounds-in-execvp.patch
+       + 00_all_0031-MIPS-Add-.insn-to-ensure-a-text-label-is-defined-as-.patch
+       + 00_all_0032-x86_64-fix-static-build-of-__memcpy_chk-for-compiler.patch
+       + 00_all_0033-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch
+       + 00_all_0034-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch
+       + 00_all_0035-alpha-fix-ceil-on-sNaN-input.patch
+       + 00_all_0036-alpha-fix-floor-on-sNaN-input.patch
+       + 00_all_0037-alpha-fix-rint-on-sNaN-input.patch
+       + 00_all_0038-alpha-fix-trunc-for-big-input-values.patch
+
 2              15 Nov 2016
        + 00_all_0029-configure-accept-__stack_chk_fail_local-for-ssp-supp.patch
 



1.1                  
src/patchsets/glibc/2.24/00_all_0030-Fix-writes-past-the-allocated-array-bounds-in-execvp.patch

file : 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0030-Fix-writes-past-the-allocated-array-bounds-in-execvp.patch?rev=1.1&view=markup
plain: 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0030-Fix-writes-past-the-allocated-array-bounds-in-execvp.patch?rev=1.1&content-type=text/plain

Index: 00_all_0030-Fix-writes-past-the-allocated-array-bounds-in-execvp.patch
===================================================================
>From 901db98f36690e4743feefd985c6ba2d7fd19813 Mon Sep 17 00:00:00 2001
From: Adhemerval Zanella <[email protected]>
Date: Mon, 21 Nov 2016 11:06:15 -0200
Subject: [PATCH] Fix writes past the allocated array bounds in execvpe
 (BZ#20847)

This patch fixes an invalid write out or stack allocated buffer in
2 places at execvpe implementation:

  1. On 'maybe_script_execute' function where it allocates the new
     argument list and it does not account that a minimum of argc
     plus 3 elements (default shell path, script name, arguments,
     and ending null pointer) should be considered.  The straightforward
     fix is just to take account of the correct list size on argument
     copy.

  2. On '__execvpe' where the executable file name lenght may not
     account for ending '\0' and thus subsequent path creation may
     write past array bounds because it requires to add the terminating
     null.  The fix is to change how to calculate the executable name
     size to add the final '\0' and adjust the rest of the code
     accordingly.

As described in GCC bug report 78433 [1], these issues were masked off by
GCC because it allocated several bytes more than necessary so that many
off-by-one bugs went unnoticed.

Checked on x86_64 with a latest GCC (7.0.0 20161121) with -O3 on CFLAGS.

        [BZ #20847]
        * posix/execvpe.c (maybe_script_execute): Remove write past allocated
        array bounds.
        (__execvpe): Likewise.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78433

(cherry picked from commit d174436712e3cabce70d6cd771f177b6fe0e097b)
---
 posix/execvpe.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/posix/execvpe.c b/posix/execvpe.c
index d933f9c92acf..7cdb06a6112e 100644
--- a/posix/execvpe.c
+++ b/posix/execvpe.c
@@ -48,12 +48,13 @@ maybe_script_execute (const char *file, char *const argv[], 
char *const envp[])
        }
     }
 
-  /* Construct an argument list for the shell.  */
+  /* Construct an argument list for the shell.  It will contain at minimum 3
+     arguments (current shell, script, and an ending NULL.  */
   char *new_argv[argc + 1];
   new_argv[0] = (char *) _PATH_BSHELL;
   new_argv[1] = (char *) file;
   if (argc > 1)
-    memcpy (new_argv + 2, argv + 1, argc * sizeof(char *));
+    memcpy (new_argv + 2, argv + 1, (argc - 1) * sizeof(char *));
   else
     new_argv[2] = NULL;
 
@@ -91,10 +92,11 @@ __execvpe (const char *file, char *const argv[], char 
*const envp[])
   /* Although GLIBC does not enforce NAME_MAX, we set it as the maximum
      size to avoid unbounded stack allocation.  Same applies for
      PATH_MAX.  */
-  size_t file_len = __strnlen (file, NAME_MAX + 1);
+  size_t file_len = __strnlen (file, NAME_MAX) + 1;
   size_t path_len = __strnlen (path, PATH_MAX - 1) + 1;
 
-  if ((file_len > NAME_MAX)
+  /* NAME_MAX does not include the terminating null character.  */
+  if (((file_len-1) > NAME_MAX)
       || !__libc_alloca_cutoff (path_len + file_len + 1))
     {
       errno = ENAMETOOLONG;
@@ -103,6 +105,9 @@ __execvpe (const char *file, char *const argv[], char 
*const envp[])
 
   const char *subp;
   bool got_eacces = false;
+  /* The resulting string maximum size would be potentially a entry
+     in PATH plus '/' (path_len + 1) and then the the resulting file name
+     plus '\0' (file_len since it already accounts for the '\0').  */
   char buffer[path_len + file_len + 1];
   for (const char *p = path; ; p = subp)
     {
@@ -123,7 +128,7 @@ __execvpe (const char *file, char *const argv[], char 
*const envp[])
          execute.  */
       char *pend = mempcpy (buffer, p, subp - p);
       *pend = '/';
-      memcpy (pend + (p < subp), file, file_len + 1);
+      memcpy (pend + (p < subp), file, file_len);
 
       __execve (buffer, argv, envp);
 
-- 
2.11.0.rc2




1.1                  
src/patchsets/glibc/2.24/00_all_0031-MIPS-Add-.insn-to-ensure-a-text-label-is-defined-as-.patch

file : 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0031-MIPS-Add-.insn-to-ensure-a-text-label-is-defined-as-.patch?rev=1.1&view=markup
plain: 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0031-MIPS-Add-.insn-to-ensure-a-text-label-is-defined-as-.patch?rev=1.1&content-type=text/plain

Index: 00_all_0031-MIPS-Add-.insn-to-ensure-a-text-label-is-defined-as-.patch
===================================================================
>From 0ab02a62e42e63b058e7a4e160dbe51762ef2c46 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <[email protected]>
Date: Thu, 17 Nov 2016 19:15:51 +0000
Subject: [PATCH] MIPS: Add `.insn' to ensure a text label is defined as code
 not data

Avoid a build error with microMIPS compilation and recent versions of
GAS which complain if a branch targets a label which is marked as data
rather than microMIPS code:

../sysdeps/mips/mips32/crti.S: Assembler messages:
../sysdeps/mips/mips32/crti.S:72: Error: branch to a symbol in another ISA mode
make[2]: *** [.../csu/crti.o] Error 1

as commit 9d862524f6ae ("MIPS: Verify the ISA mode and alignment of
branch and jump targets") closed a hole in branch processing, making
relocation calculation respect the ISA mode of the symbol referred.
This allowed diagnosing the situation where an attempt is made to pass
control from code assembled for one ISA mode to code assembled for a
different ISA mode and either relaxing the branch to a cross-mode jump
or if that is not possible, then reporting this as an error rather than
letting such code build and then fail unpredictably at the run time.

This however requires the correct annotation of branch targets as code,
because the ISA mode is not relevant for data symbols and is therefore
not recorded for them.  The `.insn' pseudo-op is used for this purpose
and has been supported by GAS since:

Wed Feb 12 14:36:29 1997  Ian Lance Taylor  <[email protected]>

        * config/tc-mips.c (mips_pseudo_table): Add "insn".
        (s_insn): New static function.
        * doc/c-mips.texi: Document .insn.

so there has been no reason to avoid it where required.  More recently
this pseudo-op has been documented, by the microMIPS architecture
specification[1][2], as required for the correct interpretation of any
code label which is not followed by an actual instruction in an assembly
source.

Use it in our crti.S files then, to mark that the trailing label there
with no instructions following is indeed not a code bug and the branch
is legitimate.

References:

[1] "MIPS Architecture for Programmers, Volume II-B: The microMIPS32
    Instruction Set", MIPS Technologies, Inc., Document Number: MD00582,
    Revision 5.04, January 15, 2014, Section 7.1 "Assembly-Level
    Compatibility", p. 533

[2] "MIPS Architecture for Programmers, Volume II-B: The microMIPS64
    Instruction Set", MIPS Technologies, Inc., Document Number: MD00594,
    Revision 5.04, January 15, 2014, Section 8.1 "Assembly-Level
    Compatibility", p. 623

2016-11-23  Matthew Fortune  <[email protected]>
            Maciej W. Rozycki  <[email protected]>

        * sysdeps/mips/mips32/crti.S (_init): Add `.insn' pseudo-op at
        `.Lno_weak_fn' label.
        * sysdeps/mips/mips64/n32/crti.S (_init): Likewise.
        * sysdeps/mips/mips64/n64/crti.S (_init): Likewise.

(cherry picked from commit cfaf1949ff1f8336b54c43796d0e2531bc8a40a2)
(cherry picked from commit 65a2b63756a4d622b938910d582d8b807c471c9a)
---
 sysdeps/mips/mips32/crti.S     | 1 +
 sysdeps/mips/mips64/n32/crti.S | 1 +
 sysdeps/mips/mips64/n64/crti.S | 1 +
 3 files changed, 3 insertions(+)

diff --git a/sysdeps/mips/mips32/crti.S b/sysdeps/mips/mips32/crti.S
index 5c0ad7328a81..dfbbdc4f8f78 100644
--- a/sysdeps/mips/mips32/crti.S
+++ b/sysdeps/mips/mips32/crti.S
@@ -74,6 +74,7 @@ _init:
        .reloc 1f,R_MIPS_JALR,PREINIT_FUNCTION
 1:     jalr $25
 .Lno_weak_fn:
+       .insn
 #else
        lw $25,%got(PREINIT_FUNCTION)($28)
        .reloc 1f,R_MIPS_JALR,PREINIT_FUNCTION
diff --git a/sysdeps/mips/mips64/n32/crti.S b/sysdeps/mips/mips64/n32/crti.S
index 00b89f3894ca..afe6d8edaae8 100644
--- a/sysdeps/mips/mips64/n32/crti.S
+++ b/sysdeps/mips/mips64/n32/crti.S
@@ -74,6 +74,7 @@ _init:
        .reloc 1f,R_MIPS_JALR,PREINIT_FUNCTION
 1:     jalr $25
 .Lno_weak_fn:
+       .insn
 #else
        lw $25,%got_disp(PREINIT_FUNCTION)($28)
        .reloc 1f,R_MIPS_JALR,PREINIT_FUNCTION
diff --git a/sysdeps/mips/mips64/n64/crti.S b/sysdeps/mips/mips64/n64/crti.S
index f59b20c63151..4049d29290ce 100644
--- a/sysdeps/mips/mips64/n64/crti.S
+++ b/sysdeps/mips/mips64/n64/crti.S
@@ -74,6 +74,7 @@ _init:
        .reloc 1f,R_MIPS_JALR,PREINIT_FUNCTION
 1:     jalr $25
 .Lno_weak_fn:
+       .insn
 #else
        ld $25,%got_disp(PREINIT_FUNCTION)($28)
        .reloc 1f,R_MIPS_JALR,PREINIT_FUNCTION
-- 
2.11.0.rc2




1.1                  
src/patchsets/glibc/2.24/00_all_0032-x86_64-fix-static-build-of-__memcpy_chk-for-compiler.patch

file : 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0032-x86_64-fix-static-build-of-__memcpy_chk-for-compiler.patch?rev=1.1&view=markup
plain: 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0032-x86_64-fix-static-build-of-__memcpy_chk-for-compiler.patch?rev=1.1&content-type=text/plain

Index: 00_all_0032-x86_64-fix-static-build-of-__memcpy_chk-for-compiler.patch
===================================================================
>From 0d5f4a32a34f048b35360a110a0e6d1c87e3eced Mon Sep 17 00:00:00 2001
From: Aurelien Jarno <[email protected]>
Date: Thu, 24 Nov 2016 12:10:13 +0100
Subject: [PATCH] x86_64: fix static build of __memcpy_chk for compilers
 defaulting to PIC/PIE

When glibc is compiled with gcc 6.2 that has been configured with
to default to PIC/PIE, the static version of __memcpy_chk is not built,
as the test is done on PIC instead of SHARED. Fix the test to check for
SHARED, like it is done for similar functions like memmove_chk.

Changelog:
        * sysdeps/x86_64/memcpy_chk.S (__memcpy_chk): Check for SHARED
        instead of PIC.

(cherry picked from commit 380ec16d62f459d5a28cfc25b7b20990c45e1cc9)
(cherry picked from commit 2d16e81babd1d7b66d10cec0bc6d6d86a7e0c95e)
---
 sysdeps/x86_64/memcpy_chk.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sysdeps/x86_64/memcpy_chk.S b/sysdeps/x86_64/memcpy_chk.S
index 2296b55119bc..a95b3ad3cff4 100644
--- a/sysdeps/x86_64/memcpy_chk.S
+++ b/sysdeps/x86_64/memcpy_chk.S
@@ -19,7 +19,7 @@
 #include <sysdep.h>
 #include "asm-syntax.h"
 
-#ifndef PIC
+#ifndef SHARED
        /* For libc.so this is defined in memcpy.S.
           For libc.a, this is a separate source to avoid
           memcpy bringing in __chk_fail and all routines
-- 
2.11.0.rc2




1.1                  
src/patchsets/glibc/2.24/00_all_0033-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch

file : 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0033-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch?rev=1.1&view=markup
plain: 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0033-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch?rev=1.1&content-type=text/plain

Index: 00_all_0033-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch
===================================================================
>From b4391b0c7def246a4503db1af683122681c12a56 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <[email protected]>
Date: Tue, 6 Sep 2016 08:50:55 -0700
Subject: [PATCH] X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ
 #20508]

There is transition penalty when SSE instructions are mixed with 256-bit
AVX or 512-bit AVX512 load instructions.  Since _dl_runtime_resolve_avx
and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM
registers, there is transition penalty when SSE instructions are used
with lazy binding on AVX and AVX512 processors.

To avoid SSE transition penalty, if only the lower 128 bits of the first
8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers
with the zero upper bits.

For AVX and AVX512 processors which support XGETBV with ECX == 1, we can
use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers
or the upper 256 bits of ZMM registers are zero.  We can restore only the
non-zero portion of vector registers with AVX/AVX512 load instructions
which will zero-extend upper bits of vector registers.

This patch adds _dl_runtime_resolve_sse_vex which saves and restores
XMM registers with 128-bit AVX store/load instructions.  It is used to
preserve YMM/ZMM registers when only the lower 128 bits are non-zero.
_dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added
and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so
that we store and load only the non-zero portion of vector registers.
This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and
_dl_runtime_profile_avx512 when only the lower 128 bits of vector
registers are used.

_dl_runtime_resolve_avx_slow is added and used for AVX processors which
don't support XGETBV with ECX == 1.  Since there is no SSE transition
penalty on AVX512 processors which don't support XGETBV with ECX == 1,
_dl_runtime_resolve_avx512_slow isn't provided.

        [BZ #20495]
        [BZ #20508]
        * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel
        processors, set Use_dl_runtime_resolve_slow and set
        Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
        * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
        New.
        (bit_arch_Use_dl_runtime_resolve_slow): Likewise.
        (index_arch_Use_dl_runtime_resolve_opt): Likewise.
        (index_arch_Use_dl_runtime_resolve_slow): Likewise.
        * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
        _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
        if Use_dl_runtime_resolve_opt is set.  Use
        _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set.
        * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
        (_dl_runtime_resolve_opt): New.  Defined for AVX and AVX512.
        (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
        * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow):
        New.
        (_dl_runtime_resolve_opt): Likewise.
        (_dl_runtime_profile): Define only if _dl_runtime_profile is
        defined.

(cherry picked from commit fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604)
(cherry picked from commit 4b8790c81c1a7b870a43810ec95e08a2e501123d)
---
 sysdeps/x86/cpu-features.c     |  14 ++++++
 sysdeps/x86/cpu-features.h     |   6 +++
 sysdeps/x86_64/dl-machine.h    |  24 +++++++++-
 sysdeps/x86_64/dl-trampoline.S |  20 ++++++++
 sysdeps/x86_64/dl-trampoline.h | 104 ++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 165 insertions(+), 3 deletions(-)

diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 9ce4b495a5e2..11b9af223195 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -205,6 +205,20 @@ init_cpu_features (struct cpu_features *cpu_features)
       if (CPU_FEATURES_ARCH_P (cpu_features, AVX2_Usable))
        cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
          |= bit_arch_AVX_Fast_Unaligned_Load;
+
+      /* To avoid SSE transition penalty, use _dl_runtime_resolve_slow.
+         If XGETBV suports ECX == 1, use _dl_runtime_resolve_opt.  */
+      cpu_features->feature[index_arch_Use_dl_runtime_resolve_slow]
+       |= bit_arch_Use_dl_runtime_resolve_slow;
+      if (cpu_features->max_cpuid >= 0xd)
+       {
+         unsigned int eax;
+
+         __cpuid_count (0xd, 1, eax, ebx, ecx, edx);
+         if ((eax & (1 << 2)) != 0)
+           cpu_features->feature[index_arch_Use_dl_runtime_resolve_opt]
+             |= bit_arch_Use_dl_runtime_resolve_opt;
+       }
     }
   /* This spells out "AuthenticAMD".  */
   else if (ebx == 0x68747541 && ecx == 0x444d4163 && edx == 0x69746e65)
diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h
index 97ffe765f4e0..a8b5a734bd4b 100644
--- a/sysdeps/x86/cpu-features.h
+++ b/sysdeps/x86/cpu-features.h
@@ -37,6 +37,8 @@
 #define bit_arch_Prefer_No_VZEROUPPER          (1 << 17)
 #define bit_arch_Fast_Unaligned_Copy           (1 << 18)
 #define bit_arch_Prefer_ERMS                   (1 << 19)
+#define bit_arch_Use_dl_runtime_resolve_opt    (1 << 20)
+#define bit_arch_Use_dl_runtime_resolve_slow   (1 << 21)
 
 /* CPUID Feature flags.  */
 
@@ -107,6 +109,8 @@
 # define index_arch_Prefer_No_VZEROUPPER FEATURE_INDEX_1*FEATURE_SIZE
 # define index_arch_Fast_Unaligned_Copy        FEATURE_INDEX_1*FEATURE_SIZE
 # define index_arch_Prefer_ERMS                FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_Use_dl_runtime_resolve_opt FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_Use_dl_runtime_resolve_slow FEATURE_INDEX_1*FEATURE_SIZE
 
 
 # if defined (_LIBC) && !IS_IN (nonlib)
@@ -277,6 +281,8 @@ extern const struct cpu_features *__get_cpu_features (void)
 # define index_arch_Prefer_No_VZEROUPPER FEATURE_INDEX_1
 # define index_arch_Fast_Unaligned_Copy        FEATURE_INDEX_1
 # define index_arch_Prefer_ERMS                FEATURE_INDEX_1
+# define index_arch_Use_dl_runtime_resolve_opt FEATURE_INDEX_1
+# define index_arch_Use_dl_runtime_resolve_slow FEATURE_INDEX_1
 
 #endif /* !__ASSEMBLER__ */
 
diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index ed0c1a8efd1b..c0f0fa16a23b 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -68,7 +68,10 @@ elf_machine_runtime_setup (struct link_map *l, int lazy, int 
profile)
   Elf64_Addr *got;
   extern void _dl_runtime_resolve_sse (ElfW(Word)) attribute_hidden;
   extern void _dl_runtime_resolve_avx (ElfW(Word)) attribute_hidden;
+  extern void _dl_runtime_resolve_avx_slow (ElfW(Word)) attribute_hidden;
+  extern void _dl_runtime_resolve_avx_opt (ElfW(Word)) attribute_hidden;
   extern void _dl_runtime_resolve_avx512 (ElfW(Word)) attribute_hidden;
+  extern void _dl_runtime_resolve_avx512_opt (ElfW(Word)) attribute_hidden;
   extern void _dl_runtime_profile_sse (ElfW(Word)) attribute_hidden;
   extern void _dl_runtime_profile_avx (ElfW(Word)) attribute_hidden;
   extern void _dl_runtime_profile_avx512 (ElfW(Word)) attribute_hidden;
@@ -118,9 +121,26 @@ elf_machine_runtime_setup (struct link_map *l, int lazy, 
int profile)
             indicated by the offset on the stack, and then jump to
             the resolved address.  */
          if (HAS_ARCH_FEATURE (AVX512F_Usable))
-           *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) 
&_dl_runtime_resolve_avx512;
+           {
+             if (HAS_ARCH_FEATURE (Use_dl_runtime_resolve_opt))
+               *(ElfW(Addr) *) (got + 2)
+                 = (ElfW(Addr)) &_dl_runtime_resolve_avx512_opt;
+             else
+               *(ElfW(Addr) *) (got + 2)
+                 = (ElfW(Addr)) &_dl_runtime_resolve_avx512;
+           }
          else if (HAS_ARCH_FEATURE (AVX_Usable))
-           *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_resolve_avx;
+           {
+             if (HAS_ARCH_FEATURE (Use_dl_runtime_resolve_opt))
+               *(ElfW(Addr) *) (got + 2)
+                 = (ElfW(Addr)) &_dl_runtime_resolve_avx_opt;
+             else if (HAS_ARCH_FEATURE (Use_dl_runtime_resolve_slow))
+               *(ElfW(Addr) *) (got + 2)
+                 = (ElfW(Addr)) &_dl_runtime_resolve_avx_slow;
+             else
+               *(ElfW(Addr) *) (got + 2)
+                 = (ElfW(Addr)) &_dl_runtime_resolve_avx;
+           }
          else
            *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_resolve_sse;
        }
diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S
index 12f1a5cf8485..39f595e1e185 100644
--- a/sysdeps/x86_64/dl-trampoline.S
+++ b/sysdeps/x86_64/dl-trampoline.S
@@ -18,6 +18,7 @@
 
 #include <config.h>
 #include <sysdep.h>
+#include <cpu-features.h>
 #include <link-defines.h>
 
 #ifndef DL_STACK_ALIGNMENT
@@ -86,9 +87,11 @@
 #endif
 #define VEC(i)                 zmm##i
 #define _dl_runtime_resolve    _dl_runtime_resolve_avx512
+#define _dl_runtime_resolve_opt        _dl_runtime_resolve_avx512_opt
 #define _dl_runtime_profile    _dl_runtime_profile_avx512
 #include "dl-trampoline.h"
 #undef _dl_runtime_resolve
+#undef _dl_runtime_resolve_opt
 #undef _dl_runtime_profile
 #undef VEC
 #undef VMOV
@@ -104,9 +107,11 @@
 #endif
 #define VEC(i)                 ymm##i
 #define _dl_runtime_resolve    _dl_runtime_resolve_avx
+#define _dl_runtime_resolve_opt        _dl_runtime_resolve_avx_opt
 #define _dl_runtime_profile    _dl_runtime_profile_avx
 #include "dl-trampoline.h"
 #undef _dl_runtime_resolve
+#undef _dl_runtime_resolve_opt
 #undef _dl_runtime_profile
 #undef VEC
 #undef VMOV
@@ -126,3 +131,18 @@
 #define _dl_runtime_profile    _dl_runtime_profile_sse
 #undef RESTORE_AVX
 #include "dl-trampoline.h"
+#undef _dl_runtime_resolve
+#undef _dl_runtime_profile
+#undef VMOV
+#undef VMOVA
+
+/* Used by _dl_runtime_resolve_avx_opt/_dl_runtime_resolve_avx512_opt
+   to preserve the full vector registers with zero upper bits.  */
+#define VMOVA                  vmovdqa
+#if DL_RUNTIME_RESOLVE_REALIGN_STACK || VEC_SIZE <= DL_STACK_ALIGNMENT
+# define VMOV                  vmovdqa
+#else
+# define VMOV                  vmovdqu
+#endif
+#define _dl_runtime_resolve    _dl_runtime_resolve_sse_vex
+#include "dl-trampoline.h"
diff --git a/sysdeps/x86_64/dl-trampoline.h b/sysdeps/x86_64/dl-trampoline.h
index b90836ab137f..abe4471c1de8 100644
--- a/sysdeps/x86_64/dl-trampoline.h
+++ b/sysdeps/x86_64/dl-trampoline.h
@@ -50,6 +50,105 @@
 #endif
 
        .text
+#ifdef _dl_runtime_resolve_opt
+/* Use the smallest vector registers to preserve the full YMM/ZMM
+   registers to avoid SSE transition penalty.  */
+
+# if VEC_SIZE == 32
+/* Check if the upper 128 bits in %ymm0 - %ymm7 registers are non-zero
+   and preserve %xmm0 - %xmm7 registers with the zero upper bits.  Since
+   there is no SSE transition penalty on AVX512 processors which don't
+   support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't
+   provided.   */
+       .globl _dl_runtime_resolve_avx_slow
+       .hidden _dl_runtime_resolve_avx_slow
+       .type _dl_runtime_resolve_avx_slow, @function
+       .align 16
+_dl_runtime_resolve_avx_slow:
+       cfi_startproc
+       cfi_adjust_cfa_offset(16) # Incorporate PLT
+       vorpd %ymm0, %ymm1, %ymm8
+       vorpd %ymm2, %ymm3, %ymm9
+       vorpd %ymm4, %ymm5, %ymm10
+       vorpd %ymm6, %ymm7, %ymm11
+       vorpd %ymm8, %ymm9, %ymm9
+       vorpd %ymm10, %ymm11, %ymm10
+       vpcmpeqd %xmm8, %xmm8, %xmm8
+       vorpd %ymm9, %ymm10, %ymm10
+       vptest %ymm10, %ymm8
+       # Preserve %ymm0 - %ymm7 registers if the upper 128 bits of any
+       # %ymm0 - %ymm7 registers aren't zero.
+       PRESERVE_BND_REGS_PREFIX
+       jnc _dl_runtime_resolve_avx
+       # Use vzeroupper to avoid SSE transition penalty.
+       vzeroupper
+       # Preserve %xmm0 - %xmm7 registers with the zero upper 128 bits
+       # when the upper 128 bits of %ymm0 - %ymm7 registers are zero.
+       PRESERVE_BND_REGS_PREFIX
+       jmp _dl_runtime_resolve_sse_vex
+       cfi_adjust_cfa_offset(-16) # Restore PLT adjustment
+       cfi_endproc
+       .size _dl_runtime_resolve_avx_slow, .-_dl_runtime_resolve_avx_slow
+# endif
+
+/* Use XGETBV with ECX == 1 to check which bits in vector registers are
+   non-zero and only preserve the non-zero lower bits with zero upper
+   bits.  */
+       .globl _dl_runtime_resolve_opt
+       .hidden _dl_runtime_resolve_opt
+       .type _dl_runtime_resolve_opt, @function
+       .align 16
+_dl_runtime_resolve_opt:
+       cfi_startproc
+       cfi_adjust_cfa_offset(16) # Incorporate PLT
+       pushq %rax
+       cfi_adjust_cfa_offset(8)
+       cfi_rel_offset(%rax, 0)
+       pushq %rcx
+       cfi_adjust_cfa_offset(8)
+       cfi_rel_offset(%rcx, 0)
+       pushq %rdx
+       cfi_adjust_cfa_offset(8)
+       cfi_rel_offset(%rdx, 0)
+       movl $1, %ecx
+       xgetbv
+       movl %eax, %r11d
+       popq %rdx
+       cfi_adjust_cfa_offset(-8)
+       cfi_restore (%rdx)
+       popq %rcx
+       cfi_adjust_cfa_offset(-8)
+       cfi_restore (%rcx)
+       popq %rax
+       cfi_adjust_cfa_offset(-8)
+       cfi_restore (%rax)
+# if VEC_SIZE == 32
+       # For YMM registers, check if YMM state is in use.
+       andl $bit_YMM_state, %r11d
+       # Preserve %xmm0 - %xmm7 registers with the zero upper 128 bits if
+       # YMM state isn't in use.
+       PRESERVE_BND_REGS_PREFIX
+       jz _dl_runtime_resolve_sse_vex
+# elif VEC_SIZE == 64
+       # For ZMM registers, check if YMM state and ZMM state are in
+       # use.
+       andl $(bit_YMM_state | bit_ZMM0_15_state), %r11d
+       cmpl $bit_YMM_state, %r11d
+       # Preserve %xmm0 - %xmm7 registers with the zero upper 384 bits if
+       # neither YMM state nor ZMM state are in use.
+       PRESERVE_BND_REGS_PREFIX
+       jl _dl_runtime_resolve_sse_vex
+       # Preserve %ymm0 - %ymm7 registers with the zero upper 256 bits if
+       # ZMM state isn't in use.
+       PRESERVE_BND_REGS_PREFIX
+       je _dl_runtime_resolve_avx
+# else
+#  error Unsupported VEC_SIZE!
+# endif
+       cfi_adjust_cfa_offset(-16) # Restore PLT adjustment
+       cfi_endproc
+       .size _dl_runtime_resolve_opt, .-_dl_runtime_resolve_opt
+#endif
        .globl _dl_runtime_resolve
        .hidden _dl_runtime_resolve
        .type _dl_runtime_resolve, @function
@@ -162,7 +261,10 @@ _dl_runtime_resolve:
        .size _dl_runtime_resolve, .-_dl_runtime_resolve
 
 
-#ifndef PROF
+/* To preserve %xmm0 - %xmm7 registers, dl-trampoline.h is included
+   twice, for _dl_runtime_resolve_sse and _dl_runtime_resolve_sse_vex.
+   But we don't need another _dl_runtime_profile for XMM registers.  */
+#if !defined PROF && defined _dl_runtime_profile
 # if (LR_VECTOR_OFFSET % VEC_SIZE) != 0
 #  error LR_VECTOR_OFFSET must be multples of VEC_SIZE
 # endif
-- 
2.11.0.rc2




1.1                  
src/patchsets/glibc/2.24/00_all_0034-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch

file : 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0034-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch?rev=1.1&view=markup
plain: 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0034-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch?rev=1.1&content-type=text/plain

Index: 00_all_0034-X86-64-Add-_dl_runtime_resolve_avx-512-_-opt-slow-BZ.patch
===================================================================
>From df13b9c22a0fb690a0ab9dd4af163ae3c459d975 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <[email protected]>
Date: Tue, 6 Sep 2016 08:50:55 -0700
Subject: [PATCH] X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ
 #20508]

There is transition penalty when SSE instructions are mixed with 256-bit
AVX or 512-bit AVX512 load instructions.  Since _dl_runtime_resolve_avx
and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM
registers, there is transition penalty when SSE instructions are used
with lazy binding on AVX and AVX512 processors.

To avoid SSE transition penalty, if only the lower 128 bits of the first
8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers
with the zero upper bits.

For AVX and AVX512 processors which support XGETBV with ECX == 1, we can
use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers
or the upper 256 bits of ZMM registers are zero.  We can restore only the
non-zero portion of vector registers with AVX/AVX512 load instructions
which will zero-extend upper bits of vector registers.

This patch adds _dl_runtime_resolve_sse_vex which saves and restores
XMM registers with 128-bit AVX store/load instructions.  It is used to
preserve YMM/ZMM registers when only the lower 128 bits are non-zero.
_dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added
and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so
that we store and load only the non-zero portion of vector registers.
This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and
_dl_runtime_profile_avx512 when only the lower 128 bits of vector
registers are used.

_dl_runtime_resolve_avx_slow is added and used for AVX processors which
don't support XGETBV with ECX == 1.  Since there is no SSE transition
penalty on AVX512 processors which don't support XGETBV with ECX == 1,
_dl_runtime_resolve_avx512_slow isn't provided.

        [BZ #20495]
        [BZ #20508]
        * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel
        processors, set Use_dl_runtime_resolve_slow and set
        Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
        * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
        New.
        (bit_arch_Use_dl_runtime_resolve_slow): Likewise.
        (index_arch_Use_dl_runtime_resolve_opt): Likewise.
        (index_arch_Use_dl_runtime_resolve_slow): Likewise.
        * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
        _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
        if Use_dl_runtime_resolve_opt is set.  Use
        _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set.
        * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
        (_dl_runtime_resolve_opt): New.  Defined for AVX and AVX512.
        (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
        * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow):
        New.
        (_dl_runtime_resolve_opt): Likewise.
        (_dl_runtime_profile): Define only if _dl_runtime_profile is
        defined.

(cherry picked from commit fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604)
---
 ChangeLog | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index a51771c97668..406a1f2ee451 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,28 @@
+2016-11-30  H.J. Lu  <[email protected]>
+
+       [BZ #20495]
+       [BZ #20508]
+       * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel
+       processors, set Use_dl_runtime_resolve_slow and set
+       Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
+       * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
+       New.
+       (bit_arch_Use_dl_runtime_resolve_slow): Likewise.
+       (index_arch_Use_dl_runtime_resolve_opt): Likewise.
+       (index_arch_Use_dl_runtime_resolve_slow): Likewise.
+       * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
+       _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
+       if Use_dl_runtime_resolve_opt is set.  Use
+       _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set.
+       * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
+       (_dl_runtime_resolve_opt): New.  Defined for AVX and AVX512.
+       (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
+       * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow):
+       New.
+       (_dl_runtime_resolve_opt): Likewise.
+       (_dl_runtime_profile): Define only if _dl_runtime_profile is
+       defined.
+
 2016-11-03  Joseph Myers  <[email protected]>
 
        * conform/Makefile ($(linknamespace-header-tests)): Also depend on
-- 
2.11.0.rc2




1.1                  
src/patchsets/glibc/2.24/00_all_0035-alpha-fix-ceil-on-sNaN-input.patch

file : 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0035-alpha-fix-ceil-on-sNaN-input.patch?rev=1.1&view=markup
plain: 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0035-alpha-fix-ceil-on-sNaN-input.patch?rev=1.1&content-type=text/plain

Index: 00_all_0035-alpha-fix-ceil-on-sNaN-input.patch
===================================================================
>From 2afb8a945ddc104c5ef9aa61f32427c19b681232 Mon Sep 17 00:00:00 2001
From: Aurelien Jarno <[email protected]>
Date: Tue, 2 Aug 2016 09:18:59 +0200
Subject: [PATCH] alpha: fix ceil on sNaN input

The alpha version of ceil wrongly return sNaN for sNaN input. Fix that
by checking for NaN and by returning the input value added with itself
in that case.

Finally remove the code to handle inexact exception, ceil should never
generate such an exception.

Changelog:
        * sysdeps/alpha/fpu/s_ceil.c (__ceil): Add argument with itself
        when it is a NaN.
        [_IEEE_FP_INEXACT] Remove.
        * sysdeps/alpha/fpu/s_ceilf.c (__ceilf): Likewise.

(cherry picked from commit 062e53c195b4a87754632c7d51254867247698b4)
(cherry picked from commit 3eff6f84311d2679a58a637e3be78b4ced275762)
---
 sysdeps/alpha/fpu/s_ceil.c  | 7 +++----
 sysdeps/alpha/fpu/s_ceilf.c | 7 +++----
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/sysdeps/alpha/fpu/s_ceil.c b/sysdeps/alpha/fpu/s_ceil.c
index c1ff864d4b86..e9c350af1cc0 100644
--- a/sysdeps/alpha/fpu/s_ceil.c
+++ b/sysdeps/alpha/fpu/s_ceil.c
@@ -26,17 +26,16 @@
 double
 __ceil (double x)
 {
+  if (isnan (x))
+    return x + x;
+
   if (isless (fabs (x), 9007199254740992.0))   /* 1 << DBL_MANT_DIG */
     {
       double tmp1, new_x;
 
       new_x = -x;
       __asm (
-#ifdef _IEEE_FP_INEXACT
-            "cvttq/svim %2,%1\n\t"
-#else
             "cvttq/svm %2,%1\n\t"
-#endif
             "cvtqt/m %1,%0\n\t"
             : "=f"(new_x), "=&f"(tmp1)
             : "f"(new_x));
diff --git a/sysdeps/alpha/fpu/s_ceilf.c b/sysdeps/alpha/fpu/s_ceilf.c
index 7e63a6fe94e7..77e01a99f743 100644
--- a/sysdeps/alpha/fpu/s_ceilf.c
+++ b/sysdeps/alpha/fpu/s_ceilf.c
@@ -25,6 +25,9 @@
 float
 __ceilf (float x)
 {
+  if (isnanf (x))
+    return x + x;
+
   if (isless (fabsf (x), 16777216.0f)) /* 1 << FLT_MANT_DIG */
     {
       /* Note that Alpha S_Floating is stored in registers in a
@@ -36,11 +39,7 @@ __ceilf (float x)
 
       new_x = -x;
       __asm ("cvtst/s %3,%2\n\t"
-#ifdef _IEEE_FP_INEXACT
-            "cvttq/svim %2,%1\n\t"
-#else
             "cvttq/svm %2,%1\n\t"
-#endif
             "cvtqt/m %1,%0\n\t"
             : "=f"(new_x), "=&f"(tmp1), "=&f"(tmp2)
             : "f"(new_x));
-- 
2.11.0.rc2




1.1                  
src/patchsets/glibc/2.24/00_all_0036-alpha-fix-floor-on-sNaN-input.patch

file : 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0036-alpha-fix-floor-on-sNaN-input.patch?rev=1.1&view=markup
plain: 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0036-alpha-fix-floor-on-sNaN-input.patch?rev=1.1&content-type=text/plain

Index: 00_all_0036-alpha-fix-floor-on-sNaN-input.patch
===================================================================
>From 9b34c1494d8e61bb3d718e2ea83b856030476737 Mon Sep 17 00:00:00 2001
From: Aurelien Jarno <[email protected]>
Date: Tue, 2 Aug 2016 09:18:59 +0200
Subject: [PATCH] alpha: fix floor on sNaN input

The alpha version of floor wrongly return sNaN for sNaN input. Fix that
by checking for NaN and by returning the input value added with itself
in that case.

Finally remove the code to handle inexact exception, floor should never
generate such an exception.

Changelog:
        * sysdeps/alpha/fpu/s_floor.c (__floor): Add argument with itself
        when it is a NaN.
        [_IEEE_FP_INEXACT] Remove.
        * sysdeps/alpha/fpu/s_floorf.c (__floorf): Likewise.

(cherry picked from commit 65cc568cf57156e5230db9a061645e54ff028a41)
(cherry picked from commit 1912cc082df4739c2388c375f8d486afdaa7d49b)
---
 sysdeps/alpha/fpu/s_floor.c  | 7 +++----
 sysdeps/alpha/fpu/s_floorf.c | 7 +++----
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/sysdeps/alpha/fpu/s_floor.c b/sysdeps/alpha/fpu/s_floor.c
index 1a6f8c461756..9930f6be42af 100644
--- a/sysdeps/alpha/fpu/s_floor.c
+++ b/sysdeps/alpha/fpu/s_floor.c
@@ -27,16 +27,15 @@
 double
 __floor (double x)
 {
+  if (isnan (x))
+    return x + x;
+
   if (isless (fabs (x), 9007199254740992.0))   /* 1 << DBL_MANT_DIG */
     {
       double tmp1, new_x;
 
       __asm (
-#ifdef _IEEE_FP_INEXACT
-            "cvttq/svim %2,%1\n\t"
-#else
             "cvttq/svm %2,%1\n\t"
-#endif
             "cvtqt/m %1,%0\n\t"
             : "=f"(new_x), "=&f"(tmp1)
             : "f"(x));
diff --git a/sysdeps/alpha/fpu/s_floorf.c b/sysdeps/alpha/fpu/s_floorf.c
index 8cd80e2b42d7..015c04f40d80 100644
--- a/sysdeps/alpha/fpu/s_floorf.c
+++ b/sysdeps/alpha/fpu/s_floorf.c
@@ -26,6 +26,9 @@
 float
 __floorf (float x)
 {
+  if (isnanf (x))
+    return x + x;
+
   if (isless (fabsf (x), 16777216.0f)) /* 1 << FLT_MANT_DIG */
     {
       /* Note that Alpha S_Floating is stored in registers in a
@@ -36,11 +39,7 @@ __floorf (float x)
       float tmp1, tmp2, new_x;
 
       __asm ("cvtst/s %3,%2\n\t"
-#ifdef _IEEE_FP_INEXACT
-            "cvttq/svim %2,%1\n\t"
-#else
             "cvttq/svm %2,%1\n\t"
-#endif
             "cvtqt/m %1,%0\n\t"
             : "=f"(new_x), "=&f"(tmp1), "=&f"(tmp2)
             : "f"(x));
-- 
2.11.0.rc2




1.1                  
src/patchsets/glibc/2.24/00_all_0037-alpha-fix-rint-on-sNaN-input.patch

file : 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0037-alpha-fix-rint-on-sNaN-input.patch?rev=1.1&view=markup
plain: 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0037-alpha-fix-rint-on-sNaN-input.patch?rev=1.1&content-type=text/plain

Index: 00_all_0037-alpha-fix-rint-on-sNaN-input.patch
===================================================================
>From 04c5f782796052de9d06975061eb3376ccbcbdb1 Mon Sep 17 00:00:00 2001
From: Aurelien Jarno <[email protected]>
Date: Tue, 2 Aug 2016 09:18:59 +0200
Subject: [PATCH] alpha: fix rint on sNaN input

The alpha version of rint wrongly return sNaN for sNaN input. Fix that
by checking for NaN and by returning the input value added with itself
in that case.

Changelog:
        * sysdeps/alpha/fpu/s_rint.c (__rint): Add argument with itself
        when it is a NaN.
        * sysdeps/alpha/fpu/s_rintf.c (__rintf): Likewise.

(cherry picked from commit cb7f9d63b921ea1a1cbb4ab377a8484fd5da9a2b)
(cherry picked from commit 8eb9a92e0522f2d4f2d4167df919d066c85d3408)
---
 sysdeps/alpha/fpu/s_rint.c  | 3 +++
 sysdeps/alpha/fpu/s_rintf.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/sysdeps/alpha/fpu/s_rint.c b/sysdeps/alpha/fpu/s_rint.c
index f33fe72c116b..259348afc08d 100644
--- a/sysdeps/alpha/fpu/s_rint.c
+++ b/sysdeps/alpha/fpu/s_rint.c
@@ -23,6 +23,9 @@
 double
 __rint (double x)
 {
+  if (isnan (x))
+    return x + x;
+
   if (isless (fabs (x), 9007199254740992.0))   /* 1 << DBL_MANT_DIG */
     {
       double tmp1, new_x;
diff --git a/sysdeps/alpha/fpu/s_rintf.c b/sysdeps/alpha/fpu/s_rintf.c
index 1400dfe8d76b..645728ad5b02 100644
--- a/sysdeps/alpha/fpu/s_rintf.c
+++ b/sysdeps/alpha/fpu/s_rintf.c
@@ -22,6 +22,9 @@
 float
 __rintf (float x)
 {
+  if (isnanf (x))
+    return x + x;
+
   if (isless (fabsf (x), 16777216.0f)) /* 1 << FLT_MANT_DIG */
     {
       /* Note that Alpha S_Floating is stored in registers in a
-- 
2.11.0.rc2




1.1                  
src/patchsets/glibc/2.24/00_all_0038-alpha-fix-trunc-for-big-input-values.patch

file : 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0038-alpha-fix-trunc-for-big-input-values.patch?rev=1.1&view=markup
plain: 
http://sources.gentoo.org/viewvc.cgi/gentoo/src/patchsets/glibc/2.24/00_all_0038-alpha-fix-trunc-for-big-input-values.patch?rev=1.1&content-type=text/plain

Index: 00_all_0038-alpha-fix-trunc-for-big-input-values.patch
===================================================================
>From b73ec923c79ab493a9265930a45800391329571a Mon Sep 17 00:00:00 2001
From: Aurelien Jarno <[email protected]>
Date: Tue, 2 Aug 2016 09:18:59 +0200
Subject: [PATCH] alpha: fix trunc for big input values

The alpha specific version of trunc and truncf always add and subtract
0x1.0p23 or 0x1.0p52 even for big values. This causes this kind of
errors in the testsuite:

  Failure: Test: trunc_towardzero (0x1p107)
  Result:
   is:          1.6225927682921334e+32   0x1.fffffffffffffp+106
   should be:   1.6225927682921336e+32   0x1.0000000000000p+107
   difference:  1.8014398509481984e+16   0x1.0000000000000p+54
   ulp       :  0.5000
   max.ulp   :  0.0000

Change this by returning the input value when its absolute value is
greater than 0x1.0p23 or 0x1.0p52. NaN have to go through the add and
subtract operations to get possibly silenced.

Finally remove the code to handle inexact exception, trunc should never
generate such an exception.

Changelog:
        * sysdeps/alpha/fpu/s_trunc.c (__trunc): Return the input value
        when its absolute value is greater than 0x1.0p52.
        [_IEEE_FP_INEXACT] Remove.
        * sysdeps/alpha/fpu/s_truncf.c (__truncf): Return the input value
        when its absolute value is greater than 0x1.0p23.
        [_IEEE_FP_INEXACT] Remove.

(cherry picked from commit b74d259fe793499134eb743222cd8dd7c74a31ce)
(cherry picked from commit e6eab16cc302e6c42f79e1af02ce98ebb9a783bc)
---
 sysdeps/alpha/fpu/s_trunc.c  | 7 +++----
 sysdeps/alpha/fpu/s_truncf.c | 7 +++----
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/sysdeps/alpha/fpu/s_trunc.c b/sysdeps/alpha/fpu/s_trunc.c
index 16cb114a72f5..4b986a6926eb 100644
--- a/sysdeps/alpha/fpu/s_trunc.c
+++ b/sysdeps/alpha/fpu/s_trunc.c
@@ -28,12 +28,11 @@ __trunc (double x)
   double two52 = copysign (0x1.0p52, x);
   double r, tmp;
 
+  if (isgreaterequal (fabs (x), 0x1.0p52))
+    return x;
+
   __asm (
-#ifdef _IEEE_FP_INEXACT
-        "addt/suic %2, %3, %1\n\tsubt/suic %1, %3, %0"
-#else
         "addt/suc %2, %3, %1\n\tsubt/suc %1, %3, %0"
-#endif
         : "=&f"(r), "=&f"(tmp)
         : "f"(x), "f"(two52));
 
diff --git a/sysdeps/alpha/fpu/s_truncf.c b/sysdeps/alpha/fpu/s_truncf.c
index 2290f282954d..3e933561663b 100644
--- a/sysdeps/alpha/fpu/s_truncf.c
+++ b/sysdeps/alpha/fpu/s_truncf.c
@@ -27,12 +27,11 @@ __truncf (float x)
   float two23 = copysignf (0x1.0p23, x);
   float r, tmp;
 
+  if (isgreaterequal (fabsf (x), 0x1.0p23))
+    return x;
+
   __asm (
-#ifdef _IEEE_FP_INEXACT
-        "adds/suic %2, %3, %1\n\tsubs/suic %1, %3, %0"
-#else
         "adds/suc %2, %3, %1\n\tsubs/suc %1, %3, %0"
-#endif
         : "=&f"(r), "=&f"(tmp)
         : "f"(x), "f"(two23));
 
-- 
2.11.0.rc2

Reply via email to