[Bug analyzer/96798] Analyzer failures on Darwin

2020-08-29 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96798

--- Comment #7 from Iain Sandoe  ---
(In reply to David Malcolm from comment #6)
> Thanks!  The "memset" call has become a call to "__builtin___memset_chk"
> (perhaps due to _FORTIFY_SOURCE, or something similar in Darwin's libc?),

(transitive include of strings.h, for macOS >= 10.5)
usr/include/_types.h:#define _FORTIFY_SOURCE 2  /* on by default */

usr/include/strings.h:

#if defined (__GNUC__) && _FORTIFY_SOURCE > 0 && !defined (__cplusplus)
/* Security checking functions.  */
#include 
#endif


secure/_strings.h:

#if _USE_FORTIFY_LEVEL > 0



#if __has_builtin(__builtin___memset_chk) || defined(__GNUC__)
#undef bzero
/* void bzero(void *s, size_t n) */
#define bzero(dest, ...) \
__builtin___memset_chk (dest, 0, __VA_ARGS__, __darwin_obsz0
(dest))
#endif

(AFAIR, fort

> and the analyzer doesn't (yet) know about that builtin.
> 
> I can reproduce the issue by hacking this into the test:
> 
> #define memset(DST, SRC, LEN) \
>   __builtin___memset_chk ((DST), (SRC), (LEN), \
> __builtin_object_size((DST), 0))
> 
> There are at least two issues here:
> (a) looks like region_model::on_call_pre is erroneously treating a builtin I
> haven't coded yet as a no-op; it should instead conservatively assume that
> any escaped/reachable regions are affected
> (b) the analyzer should handle that builtin (and probably others)

[Bug analyzer/96798] Analyzer failures on Darwin

2020-08-29 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96798

--- Comment #8 from Iain Sandoe  ---
(In reply to Iain Sandoe from comment #7)
> (In reply to David Malcolm from comment #6)
> > Thanks!  The "memset" call has become a call to "__builtin___memset_chk"
> > (perhaps due to _FORTIFY_SOURCE, or something similar in Darwin's libc?),
> 
> (transitive include of strings.h, for macOS >= 10.5)
> usr/include/_types.h:#define _FORTIFY_SOURCE 2/* on by default */
> 
> usr/include/strings.h:
> 
> #if defined (__GNUC__) && _FORTIFY_SOURCE > 0 && !defined (__cplusplus)
> /* Security checking functions.  */
> #include 
> #endif
> 
> 
> secure/_strings.h:
> 
> #if _USE_FORTIFY_LEVEL > 0
> 
> 
> 
> #if __has_builtin(__builtin___memset_chk) || defined(__GNUC__)
> #undef bzero
> /* void bzero(void *s, size_t n) */
> #define bzero(dest, ...) \
> __builtin___memset_chk (dest, 0, __VA_ARGS__, __darwin_obsz0
> (dest))
> #endif
> 

Oops hit send too soon.

string.h is a transitive include of strings.h and has:
#if defined (__GNUC__) && _FORTIFY_SOURCE > 0 && !defined (__cplusplus)
/* Security checking functions.  */
#include 
#endif

_strings.h:

#if __IPHONE_OS_VERSION_MIN_REQUIRED >= 7 ||
__MAC_OS_X_VERSION_MIN_REQUIRED >= 1090 || \
defined(__DRIVERKIT_VERSION_MIN_REQUIRED)
#if __has_builtin(__builtin___memccpy_chk) && __HAS_FIXED_CHK_PROTOTYPES
#undef memccpy
/* void *memccpy(void *dst, const void *src, int c, size_t n) */
#define memccpy(dest, ...) \
__builtin___memccpy_chk (dest, __VA_ARGS__, __darwin_obsz0
(dest))
#endif
#endif

So.. essentially, the checked builtins are going to be used everywhere by
default on modern Darwin (and some are going to be used even on venerable
Darwin).

[Bug analyzer/96798] Analyzer failures on Darwin

2020-08-29 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96798

--- Comment #9 from Iain Sandoe  ---
(In reply to Iain Sandoe from comment #8)
> (In reply to Iain Sandoe from comment #7)
> > (In reply to David Malcolm from comment #6)
> > > Thanks!  The "memset" call has become a call to "__builtin___memset_chk"
> > > (perhaps due to _FORTIFY_SOURCE, or something similar in Darwin's libc?),
> > 
> > (transitive include of strings.h, for macOS >= 10.5)
> > usr/include/_types.h:#define _FORTIFY_SOURCE 2  /* on by default */
> > 
> > usr/include/strings.h:
> > 
> > #if defined (__GNUC__) && _FORTIFY_SOURCE > 0 && !defined (__cplusplus)
> > /* Security checking functions.  */
> > #include 
> > #endif
> > 
> > 
> > secure/_strings.h:
> > 
> > #if _USE_FORTIFY_LEVEL > 0
> > 
> > 
> > 
> > #if __has_builtin(__builtin___memset_chk) || defined(__GNUC__)
> > #undef bzero
> > /* void bzero(void *s, size_t n) */
> > #define bzero(dest, ...) \
> > __builtin___memset_chk (dest, 0, __VA_ARGS__, __darwin_obsz0
> > (dest))
> > #endif
> > 
> 
> Oops hit send too soon.
> 
> string.h is a transitive include of strings.h and has:
> #if defined (__GNUC__) && _FORTIFY_SOURCE > 0 && !defined (__cplusplus)
> /* Security checking functions.  */
> #include 
> #endif
> 
> _strings.h:
^^ typo -- secure/_string.h:
> 
> #if __IPHONE_OS_VERSION_MIN_REQUIRED >= 7 ||
> __MAC_OS_X_VERSION_MIN_REQUIRED >= 1090 || \
> defined(__DRIVERKIT_VERSION_MIN_REQUIRED)
> #if __has_builtin(__builtin___memccpy_chk) && __HAS_FIXED_CHK_PROTOTYPES
> #undef memccpy
> /* void *memccpy(void *dst, const void *src, int c, size_t n) */
> #define memccpy(dest, ...) \
> __builtin___memccpy_chk (dest, __VA_ARGS__, __darwin_obsz0
> (dest))
> #endif
> #endif
> 
> So.. essentially, the checked builtins are going to be used everywhere by
> default on modern Darwin (and some are going to be used even on venerable
> Darwin).

[Bug fortran/96495] [gfortran] Composition of user-defined operators does not copy ALLOCATABLE property of derived type

2020-08-29 Thread pault at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96495

--- Comment #4 from Paul Thomas  ---
The fix is submitted at:
https://gcc.gnu.org/pipermail/fortran/2020-August/054945.html

Regards

Paul

[Bug middle-end/96200] Implement __builtin_thread_pointer() and __builtin_set_thread_pointer() if TLS is supported

2020-08-29 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96200

--- Comment #7 from H.J. Lu  ---
(In reply to Florian Weimer from comment #6)
> (In reply to H.J. Lu from comment #4)
> > On Linux/i386 and Linux/x86-64, thread pointer access is done via syscall.
> > On Linux/x86-64, __builtin_thread_pointer and __builtin_set_thread_pointer
> > may be implemented with FSGSBASE ISA.  Is it possible to implement these
> > builtins on Linux/i386 and Linux/x86-64 for all processors?
> 
> It's effectively part of the x86-64 ABI, but I think it's currently
> undocumented. On x86-64, it looks like this:
> 
> static inline void *
> thread_pointer (void)
> {
>   void *result;
>   asm ("mov %%fs:0, %0" : "=r" (result));
>   return result;
> }
> 
> i386 is similar, but with %gs, I think.
> 
> This is ABI since the early NPTL days, and GCC knows about this very
> explicitly, to implement the -mno-tls-direct-seg-refs option.

Give that the tcb field is setup by the C run-time on Linux/x86, should
it be provided by a run-time header file?

[Bug middle-end/96200] Implement __builtin_thread_pointer() and __builtin_set_thread_pointer() if TLS is supported

2020-08-29 Thread fw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96200

--- Comment #8 from Florian Weimer  ---
(In reply to H.J. Lu from comment #7)
> Give that the tcb field is setup by the C run-time on Linux/x86, should
> it be provided by a run-time header file?

Yes, it seems reasonable to me. Ideally, it would be documented in the ABI
manual as well.

[Bug middle-end/92210] no warning for invariable used in loop condition (i.e. add clang's -Wfor-loop-analysis)

2020-08-29 Thread dcb314 at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92210

David Binderman  changed:

   What|Removed |Added

 CC||dcb314 at hotmail dot com

--- Comment #4 from David Binderman  ---
Not only does clang detect that the control variable isn't changed,
it also detects when the control variable is changed too much.

Code like this gets warned:

for (int i = 0; i < 10; ++i)
{

[Bug middle-end/92210] no warning for invariable used in loop condition (i.e. add clang's -Wfor-loop-analysis)

2020-08-29 Thread dcb314 at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92210

--- Comment #5 from David Binderman  ---
Not only does clang detect that the control variable isn't changed,
it also detects when the control variable is changed too much.

Code like this gets warned:

for (int i = 0; i < 10; ++i)
{
 // whatever
 ++i;
}

aug29c.cc:9:5: warning: variable 'i' is incremented both in the loop header and
in the loop body [-Wfor-loop-analysis]
++i;
  ^
aug29c.cc:6:28: note: incremented here
for (int i = 0; i < 10; ++i)
  ^

[Bug c/96842] New: enhancement: copy clang Wheader-guard

2020-08-29 Thread dcb314 at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96842

Bug ID: 96842
   Summary: enhancement: copy clang Wheader-guard
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

clang checks header guard macros and gcc doesn't.

So code in  like

#ifndef _STDIO_H
#define _STDIO_H_1

where the define is different to the previous test, gets caught.

For example:

AIPlayerFactory.h:19:2: warning: 'AMEOBAX_AI_PLAYER_FACTORY_H' is used as a
header guard here, followed by #define of a different macro [-Wheader-guard]

There are about 500 examples of this in Fedora Linux distribution.

[Bug c++/90885] GCC should warn about 2^16 and 2^32 and 2^64

2020-08-29 Thread dcb314 at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90885

David Binderman  changed:

   What|Removed |Added

 CC||dcb314 at hotmail dot com

--- Comment #22 from David Binderman  ---
clang only seems to warn for 2 ^ X and 10 ^ Y.

There seems to be about 30 cases of this problem across the Fedora Linux
distribution, so not the biggest problem in the world.

[Bug fortran/96839] gfortran thinks common_bits starts a common block

2020-08-29 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96839

Dominique d'Humieres  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Priority|P3  |P4
 Ever confirmed|0   |1
   Last reconfirmed||2020-08-29

--- Comment #1 from Dominique d'Humieres  ---
With GCC10 and 11 the errors are

pr96839.f90:47:38:

   47 | common_bits = min(self % bits, set2 % bits)
  |  1
Error: Expected argument list at (1)
pr96839.f90:57:25:

   57 |bits = self % bits
  | 1
Error: Expected argument list at (1)
pr96839.f90:25:17:

   25 | procedure, pass(self) :: bits
  | 1
Error: Procedure 'bits' at (1) has the same name as a component of 'bitset_t'

With GCC7 to GCC9 the first error is replaced with

pr96839.f90:47:14:

   47 | common_bits = min(self % bits, set2 % bits)
  |  1
Error: Syntax error in COMMON statement at (1)

Could this PR be considered as FIXED?

[Bug c++/91618] template-id required to friend a function template, even for a qualified-id

2020-08-29 Thread language.lawyer at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91618

Language Lawyer  changed:

   What|Removed |Added

 CC||language.lawyer at gmail dot 
com

--- Comment #5 from Language Lawyer  ---
Dup of bug 88725

[Bug fortran/96839] gfortran thinks common_bits starts a common block

2020-08-29 Thread w.clodius at icloud dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96839

--- Comment #2 from William Clodius  ---
I think so.

> On Aug 29, 2020, at 8:17 AM, dominiq at lps dot ens.fr 
>  wrote:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96839
> 
> Dominique d'Humieres  changed:
> 
>   What|Removed |Added
> 
> Status|UNCONFIRMED |WAITING
>   Priority|P3  |P4
> Ever confirmed|0   |1
>   Last reconfirmed||2020-08-29
> 
> --- Comment #1 from Dominique d'Humieres  ---
> With GCC10 and 11 the errors are
> 
> pr96839.f90:47:38:
> 
>   47 | common_bits = min(self % bits, set2 % bits)
>  |  1
> Error: Expected argument list at (1)
> pr96839.f90:57:25:
> 
>   57 |bits = self % bits
>  | 1
> Error: Expected argument list at (1)
> pr96839.f90:25:17:
> 
>   25 | procedure, pass(self) :: bits
>  | 1
> Error: Procedure 'bits' at (1) has the same name as a component of 'bitset_t'
> 
> With GCC7 to GCC9 the first error is replaced with
> 
> pr96839.f90:47:14:
> 
>   47 | common_bits = min(self % bits, set2 % bits)
>  |  1
> Error: Syntax error in COMMON statement at (1)
> 
> Could this PR be considered as FIXED?
> 
> -- 
> You are receiving this mail because:
> You reported the bug.

[Bug fortran/96843] New: gfortran rejects as shape mismatch rank one logical array arguments

2020-08-29 Thread w.clodius at icloud dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96843

Bug ID: 96843
   Summary: gfortran rejects as shape mismatch rank one logical
array arguments
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: w.clodius at icloud dot com
  Target Milestone: ---

Created attachment 49153
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49153&action=edit
A greatly reduced example of the rejected code.

I have a code with procedures with rank one logical array arguments with
INTENT(IN). They require an INTERFACE. gfortran 10.2 and 8.1 reject the
interface code with the message

gfortran -fmax-errors=10 test_shape_mismatch.f90
test_shape_mismatch.f90:56:60:

   56 | pure module subroutine assign_log8_large( self, alogical )
  |1
Error: Shape mismatch in argument 'alogical' at (1)


As near as I can tell the shapes and other attributes of the argument lists
agree between the interface and the main procedure body.

[Bug fortran/96843] gfortran rejects as shape mismatch rank one logical array arguments

2020-08-29 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96843

Dominique d'Humieres  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2020-08-29
 Ever confirmed|0   |1

--- Comment #1 from Dominique d'Humieres  ---
This seems fixed on GCC11.

[Bug middle-end/96200] Implement __builtin_thread_pointer() and __builtin_set_thread_pointer() if TLS is supported

2020-08-29 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96200

--- Comment #9 from H.J. Lu  ---
(In reply to Florian Weimer from comment #8)
> (In reply to H.J. Lu from comment #7)
> > Give that the tcb field is setup by the C run-time on Linux/x86, should
> > it be provided by a run-time header file?
> 
> Yes, it seems reasonable to me. Ideally, it would be documented in the ABI
> manual as well.

https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/14

[Bug c/96844] New: OpenMP: two worksharing constructs with different num_threads clauses break thread pooling

2020-08-29 Thread mority at posteo dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96844

Bug ID: 96844
   Summary: OpenMP: two worksharing constructs with different
num_threads clauses break thread pooling
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mority at posteo dot net
  Target Milestone: ---

Created attachment 49154
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49154&action=edit
Code that produces bug

Hi,

if a for loop contains two OpenMP worksharing constructs which specify
different values in their num_threads clauses, thread pooling seems not to be
working correctly. 

E.g., the first worksharing construct has num_threads(2) and the second
num_threads(4). The expected behavior would be that a total of 4 threads is
created. The first worksharing construct uses 2 of these threads and the second
all of them. 

However, this seems not be the case. While thread pooling seems to work for the
first worksharing construct, it fails for the second. Every time the second
worksharing construct is executed, 2 new threads are created. This causes
significant overhead.

For clarification: There is no nested parallelism.

The attached code can be used to reproduce the bug. The code can be compiled
into 4 different versions using conditional compilation:

1. no OpenMP
gcc -O3 -I. -Wall -g -DPRINT_TID mwe2_woMPI.c -o mwe2_woMPI

2. worksharing construct foo only
gcc -O3 -I. -Wall -g -DPRINT_TID -DPRAGMA_FOO -fopenmp mwe2_woMPI.c -o
mwe2_woMPI_foo

3. worksharing construct bar only
gcc -O3 -I. -Wall -g -DPRINT_TID -DPRAGMA_BAR -fopenmp mwe2_woMPI.c -o
mwe2_woMPI_bar

4. both worksharing constructs
gcc -O3 -I. -Wall -g -DPRINT_TID -DPRAGMA_FOO -DPRAGMA_BAR -fopenmp
mwe2_woMPI.c -o mwe2_woMPI_foobar

I analyzed the output of the different versions which contains the thread id
for every iteration. Each worksharing construct in isolation works correctly
and 2 or 4 threads are created, respectively. However, if both worksharing
constructs are used at the same time, the first worksharing construct uses 2
different threads and the second 22 different threads.

GCC versions 8.3, 9.2. and 10.2 all show this behavior. I also compiled the
code with clang 10.1 and icc 19.4 which both handle the case correctly.

[Bug c/96844] OpenMP: two worksharing constructs with different num_threads clauses break thread pooling

2020-08-29 Thread mority at posteo dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96844

--- Comment #1 from Moritz Fischer  ---
Created attachment 49155
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49155&action=edit
python script to count number of different threads used for each worksharing
construct

[Bug fortran/96843] gfortran rejects as shape mismatch rank one logical array arguments

2020-08-29 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96843

--- Comment #2 from Dominique d'Humieres  ---
The code compiles with r11-2639 (2020-08-10), but gives the error with r11-2402
(2020-07-29). May be r11-2489 for pr96320.

[Bug c++/67135] [thread_local] heap-use-after-free (OS X 10.10.4)

2020-08-29 Thread tobias.bruell at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67135

Toby Brull  changed:

   What|Removed |Added

 CC||tobias.bruell at gmail dot com

--- Comment #1 from Toby Brull  ---
This seems to be fixed from version 5.3 on.

Was able to confirm the bug in gcc 5.2 via wandbox.org (although it failed
there with a different ASAN error).

Testing on wandbox.org, this worked for gcc version:
5.3,
6.1, 6.2, 6.3,
7.1,  7.3,
8.1,  8.3,
  9.3,
10.1

Also worked on my local ubuntu gcc installs (6.5, 7.5, 8.4, 10.1).

So should probably be closed?

[Bug libgcc/96845] New: undefined reference to `__aarch64_ldadd4_acq_rel'

2020-08-29 Thread bero at lindev dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96845

Bug ID: 96845
   Summary: undefined reference to `__aarch64_ldadd4_acq_rel'
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bero at lindev dot ch
  Target Milestone: ---

When compiling some packages on aarch64 (e.g. polkit-qt-1 0.113.0) on aarch64,
the build errors out with

/usr/lib64/gcc/aarch64-openmandriva-linux-gnu/10.2.0/../../../../aarch64-openmandriva-linux-gnu/bin/ld:
core/CMakeFiles/polkit-qt5-core-1.dir/polkitqt1-authority.cpp.o: in function
`std::__atomic_base::operator++()':
/usr/include/c++/10.2.0/bits/atomic_base.h:326: undefined reference to
`__aarch64_ldadd4_acq_rel'

This seems to be caused by __aarch64_ldadd4_acq_rel being defined in libgcc,
but not libgcc_s, while only libgcc_s is pulled in automatically.

Some Linux distributions have a workaround for this in their gcc packaging -
they replace libgcc_s.so with an ld script that pulls in libgcc if needed (see
e.g. https://src.fedoraproject.org/rpms/gcc/blob/master/f/gcc.spec#_1303 )

[Bug c++/69775] thread_local extern variable causes linkage error

2020-08-29 Thread tobias.bruell at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69775

Toby Brull  changed:

   What|Removed |Added

 CC||tobias.bruell at gmail dot com

--- Comment #3 from Toby Brull  ---
The "minimal example with --save-tems" is working for me on gcc versions 7.5,
8.4, 10.1.

Close?

[Bug target/96846] New: [x86] Prefer xor/test/setcc over test/setcc/movzx sequence

2020-08-29 Thread andysem at mail dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96846

Bug ID: 96846
   Summary: [x86] Prefer xor/test/setcc over test/setcc/movzx
sequence
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

This has been reported already in bug 86352, but that bug also describes a few
other issues, so I decoded to create a separate bug focused on this one
particular issue.

When performing boolean operations, gcc often prefers the following pattern:

1. Test instruction (test/cmp).
2. "setcc %al" (where al can be any 8-bit register).
3. If the bool needs to be further processed, "movzx %al, %eax" (where al and
eax are 8 and 32-bit versions of the register picked in #2).

The example code can be seen here:

https://gcc.godbolt.org/z/z3WGbq

For convenience I'm posting the code below:

bool narrow_boolean(int a) { return a!=5; }

unsigned int countmatch(unsigned int arr[], unsigned int key)
{
unsigned count = 0;
for (int i=0; i<1024 ; i++) {
count += ((arr[i] & key) != 5);
}
return count;
}

narrow_boolean(int):
cmp edi, 5
setne   al
ret

countmatch(unsigned int*, unsigned int):
lea rcx, [rdi+4096]
xor eax, eax
.L4:
mov edx, DWORD PTR [rdi]
and edx, esi
cmp edx, 5
setne   dl
movzx   edx, dl
add rdi, 4
add eax, edx
cmp rcx, rdi
jne .L4
ret

Command line parameters: -Wall -O3 -mtune=skylake -fno-tree-vectorize

The problem with the generated code is as follows:

- The setcc instruction only modifies the lower 8 bits of the full
architectural register. The upper bits remain unmodified and potentially
"dirty" meaning that the following instructions taking this register as input
may require merging the full register value, with a performance penalty.
- Since setcc preserves upper bits, the following instructions consuming the
output register become dependent on the previous instructions that write that
register. This results in an unnecessary dependency. This is especially a
problem in the case of narrow_boolean above.
- On Haswell and later, "movzx %al, %eax" is not eliminated at register rename
and consumes ALU and has non-zero latency. "movzx %al, %ebx" (i.e. when the
output register is different from input) is eliminated at rename stage, but
requires an additional register. But gcc seems to prefer the former form,
unless -frename-registers is specified.

See this excellent StackOverflow question and answers for details:

https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to

(BTW, the code in this bug originated from that question.)

A better instruction pattern would be this:

1. Zero the target flag register beforehand with "xor %eax, %eax".
2. Perform the test.
3. "setcc %al" to set the flag.

In case if this pattern is in a loop body, the initial xor can be hoisted out
of the loop. The important part is that the xor eliminates the dependency and
doesn't cost ALU uop. Arguably, it is also smaller code since xor is only 2
bytes vs. 3 bytes for movxz.

The initial zeroing requires an additional register, meaning that it cannot
reuse the register involved in the test in #2. However, that is often not a
problem, like in the examples above. I guess, the compiler could estimate if
there are spare registers and use xor/test/setcc sequence if there are and
test/setcc/movzx if not.

[Bug libfortran/93727] Fortran 2018: EX edit descriptor

2020-08-29 Thread jvdelisle at charter dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93727

--- Comment #4 from jvdelisle at charter dot net ---
An Update. I have the front end and runtime parsing for OUTPUT done and am now
looking at the actual implementation.  We have the printf series of functions
available and can use the %A format specifier to create the hexidecimal float
string.

I want to note here that on input, apparently the F editing requires one to be
able to READ a hexadecimal float with the F descriptor as well as the EX
descriptor. See 
13.7.2.3.2 part 7.

Another aspect I am studying has to do with rounding.  On output I assume we
must support RU, RD, RN vs truncating. Considering that the purpose of this
type of representation of a float is mostly for "looking under the hood" I
initially was thinking simple truncation should suffice and if a user does not
know they have not specified enough precision for all hexadecimals, tough
beans.  After all, in hexadecimal, it is always an exact fit. I am curious what
others think about this.

Regardless, we will need a new rounding function to round the hexadecimals
before output.

On input, obviously we need to do a new read function and for the F descriptor
we will have to look ahead to identify that it is a hexadecimal before
processing it, otherwise it just looks like a bad float.

[Bug c++/59994] [meta-bug] thread_local

2020-08-29 Thread tobias.bruell at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59994

Toby Brull  changed:

   What|Removed |Added

 CC||tobias.bruell at gmail dot com

--- Comment #1 from Toby Brull  ---
On 66360 someone commented that it is unrelated to "thread_local"; so probably
shouldn't be on the "Depends on" list of this one?

[Bug libgcc/96845] undefined reference to `__aarch64_ldadd4_acq_rel'

2020-08-29 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96845

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
This is a distro issue.
libgcc_s.so is a linker script when compiling GCC by itself.
It contains:
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library.  */
GROUP ( libgcc_s.so.1 -lgcc )

 CUT 
SO again this is a distro issue.

[Bug libgcc/96845] undefined reference to `__aarch64_ldadd4_acq_rel'

2020-08-29 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96845

--- Comment #2 from Andrew Pinski  ---
(In reply to Bernhard Rosenkraenzer from comment #0)
> Some Linux distributions have a workaround for this in their gcc packaging -
> they replace libgcc_s.so with an ld script that pulls in libgcc if needed

Or rather they replace the already ld script with another ld script ...

[Bug target/96846] [x86] Prefer xor/test/setcc over test/setcc/movzx sequence

2020-08-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96846

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
This is something GCC is well aware of and attempts to optimize, it has several
peephole2s, starting with the
;; Convert setcc + movzbl to xor + setcc if operands don't overlap.
comment in i386.md.
The reason why a xor isn't added in your first function is that there is no
extension in that case, usually callers will just use the 8-bit register and
then the xor would be a waste of time.
The second case isn't handled because there is no place to insert the xor to,
xor modifies flags, so it can't go after the comparison that sets the flags,
and in this case can't go before either, because the register is live there (it
is an operand of the comparison).  So, I guess the only way around would be to
look if there is some currently unused register that could be used instead of
the one chosen by the register allocation, but that is not always the case.

[Bug c++/58366] invocation of thread_local class containing bound function leads to : "Illegal instruction: 4"

2020-08-29 Thread tobias.bruell at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58366

Toby Brull  changed:

   What|Removed |Added

 CC||tobias.bruell at gmail dot com

--- Comment #13 from Toby Brull  ---
Seems to be fixed in more recent versions (I tried 7.5, 8.4, 10.1).

[Bug c++/88292] Static initialization problem with thread_local and templates

2020-08-29 Thread tobias.bruell at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88292

--- Comment #2 from Toby Brull  ---
Probably (at least partly) a duplicate of PR 81880. Seems to be working now on
more recent versions (7.5, 8.4, 10.1), even though PR 81880 still persists.

Close?

[Bug target/96846] [x86] Prefer xor/test/setcc over test/setcc/movzx sequence

2020-08-29 Thread andysem at mail dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96846

--- Comment #2 from andysem at mail dot ru ---
(In reply to Jakub Jelinek from comment #1)

In the godbolt link there is also a third case, which is similar to the second
one, but does not reuse the source register for comparison results.

unsigned int doubletmatch(unsigned int arr[], unsigned int key)
{
unsigned count = 0;
for (int i=0; i<1024 ; i++) {
count += (arr[i] == key) | (arr[i] == ~key);
}
return count;
}

doubletmatch(unsigned int*, unsigned int):
mov r9d, esi
not r9d
lea rcx, [rdi+4096]
xor r8d, r8d
.L7:
mov edx, DWORD PTR [rdi]
cmp edx, esi
seteal
cmp edx, r9d
setedl
or  eax, edx
movzx   eax, al
add rdi, 4
add r8d, eax
cmp rcx, rdi
jne .L7
mov eax, r8d
ret

Note that in this case eax is not cleared either, so the "or eax, edx" has a
dependency on whatever prior instruction wrote to eax (before sete).

[Bug middle-end/87256] hppa spends huge amount of time in synth_mult()

2020-08-29 Thread slyfox at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87256

--- Comment #18 from Sergei Trofimovich  ---
Thank you!

[Bug target/96846] [x86] Prefer xor/test/setcc over test/setcc/movzx sequence

2020-08-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96846

Jakub Jelinek  changed:

   What|Removed |Added

 CC||uros at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
mov edx, DWORD PTR [rdi]
cmp edx, esi
seteal
cmp edx, r9d
setedl
or  eax, edx
movzx   eax, al
This isn't what the peepholes are looking for, there are several other insns in
between, and peephole2s only work on exact insn sequences, doing anything more
complex would require doing it in some machine specific pass.
Note, while in theory it could add xor eax, eax before the cmp edx, esi insn,
it can't add xor edx, edx because the second comparison uses that register.

[Bug target/96846] [x86] Prefer xor/test/setcc over test/setcc/movzx sequence

2020-08-29 Thread andysem at mail dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96846

--- Comment #4 from andysem at mail dot ru ---
(In reply to Jakub Jelinek from comment #3)
> mov edx, DWORD PTR [rdi]
> cmp edx, esi
> seteal
> cmp edx, r9d
> setedl
> or  eax, edx
> movzx   eax, al
> This isn't what the peepholes are looking for, there are several other insns
> in between, and peephole2s only work on exact insn sequences, doing anything
> more complex would require doing it in some machine specific pass.

Yes, I think, this optimization needs to happen at an earlier stage. Rewriting
fixed instruction sequences doesn't allow for further optimizations like
hoisting the xor out of the loop body.

> Note, while in theory it could add xor eax, eax before the cmp edx, esi
> insn, it can't add xor edx, edx because the second comparison uses that
> register.

I don't think it should generate "xor edx, edx". I think, the logic has to be
roughly something like this:

1. Check if there is a spare register that we can use for the test result. If
there is, allocate it.
2. If we have a register, clear it with a xor before the test. Ideally, move
that xor out of the loop.
3. If not, decide if we are going to reuse one of the source registers or spill
some other register.
4. In the former case, keep the test/setcc/movxz sequence. In the latter, we
can still use xor/test/setcc, after spilling the victim register.

I.e. the main point is that it shouldn't try reusing the source register as
much; only reuse when you have to. Maybe, this requires some help from the
register allocator.

I admit, I have little knowledge how gcc internally works, so I may be talking
nonsense. That's just my naive thoughts about it.

[Bug c/96847] New: Code size increase +42% depending on memory size allocated on stack for ARM Cortex-M3

2020-08-29 Thread fredrik.hederstie...@securitas-direct.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96847

Bug ID: 96847
   Summary: Code size increase +42% depending on memory size
allocated on stack for ARM Cortex-M3
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fredrik.hederstie...@securitas-direct.com
  Target Milestone: ---

Created attachment 49156
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49156&action=edit
Example showing +42% increase depending on stack mem array sizes

When comping with GCC-10.x.0 I get a code size increase depending on the size
of memory for arrays on stack.

On older GCC-9.x.0 does not get this size increase.

On a slightly constructed test-case from CSiBE bzip2 I get more than +42% size
increase.

Target: arm-none-eabi Cortex-M3

See example attached, if I chose a 2 bytes less size for stack mem array I get
a totally different result? How can stack memory arrays sizes make this
difference, and why is this new with GCC-10.x?

[Bug c++/96848] New: Inherited conditionally explicit constructors via using declaration do not enforce explicitness if dependent on template parameter

2020-08-29 Thread northon_patrick3 at yahoo dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96848

Bug ID: 96848
   Summary: Inherited conditionally explicit constructors via
using declaration do not enforce explicitness if
dependent on template parameter
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: northon_patrick3 at yahoo dot ca
  Target Milestone: ---

Example: https://godbolt.org/z/4x9r7e

```cpp
#include 
#include 

struct E {};
struct F {};
struct G {};

template 
constexpr bool isValidImplicitConvert = false;
template <>
constexpr bool isValidImplicitConvert = true;

template 
struct A
{
template 
explicit(!isValidImplicitConvert) constexpr A(const OT_ &) {}

//explicit(isValidImplicitConvert) constexpr A(E) {}
//explicit(isValidImplicitConvert) constexpr A(F) {}
};

template 
struct B : public A
{
using A::A;
};

int main(int, char **)
{
std::cout << std::is_convertible_v> << '\n';
std::cout << std::is_convertible_v> << '\n';

std::cout << std::is_convertible_v> << '\n';
std::cout << std::is_convertible_v> << '\n';

return 0;
}
```

Output:
```
1
0
1
1
```

Expected output:
```
1
0
1
0
```

When inheriting constructors from a base class via a using declaration, if said
constructors are conditionally explicit and the condition is dependent on a
deduced template parameter, the explicit state is ignored.

Clang give the expected output, however I am not sure who is correct.

[Bug target/96127] ICE in extract_insn, at recog.c:2294

2020-08-29 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96127

Arseny Solokha  changed:

   What|Removed |Added

 CC||asolokha at gmx dot com

--- Comment #2 from Arseny Solokha  ---
Is there any work pending?

[Bug target/96849] New: [11 Regression] ICE: in extract_insn, at recog.c:2294 (error: unrecognizable insn)

2020-08-29 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96849

Bug ID: 96849
   Summary: [11 Regression] ICE: in extract_insn, at recog.c:2294
(error: unrecognizable insn)
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---
Target: x86_64-unknown-linux-gnu-gcc

gcc-11.0.0-alpha20200823 snapshot (g:87c753ac241f25d222d46ba1ac66ceba89d6a200)
ICEs when compiling the following testcase, reduced from
gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-1.c, w/ -mavx512f -O1
-ftree-loop-vectorize:

double my[16];

void
lw (unsigned int dd)
{
  while (dd < 16)
{
  my[dd] = dd;
  ++dd;
}
}

% x86_64-unknown-linux-gnu-gcc-11.0.0 -mavx512f -O1 -ftree-loop-vectorize -c
ppoh4ko6.c
ppoh4ko6.c: In function 'lw':
ppoh4ko6.c:11:1: error: unrecognizable insn:
   11 | }
  | ^
(insn 33 32 34 5 (set (reg:QI 116)
(lt:QI (reg:V8DF 115)
(reg:V8DF 113))) "ppoh4ko6.c":8:14 -1
 (nil))
during RTL pass: vregs
ppoh4ko6.c:11:1: internal compiler error: in extract_insn, at recog.c:2294
0x689824 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200823/work/gcc-11-20200823/gcc/rtl-error.c:108
0x689840 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200823/work/gcc-11-20200823/gcc/rtl-error.c:116
0x687d04 extract_insn(rtx_insn*)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200823/work/gcc-11-20200823/gcc/recog.c:2294
0xa91c95 instantiate_virtual_regs_in_insn
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200823/work/gcc-11-20200823/gcc/function.c:1607
0xa91c95 instantiate_virtual_regs
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200823/work/gcc-11-20200823/gcc/function.c:1977
0xa91c95 execute
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200823/work/gcc-11-20200823/gcc/function.c:2026