[Bug web/96996] New: Missed optimzation for constant members of non-constant objects

2020-09-09 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96996

Bug ID: 96996
   Summary: Missed optimzation for constant members of
non-constant objects
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: web
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthijs at stdin dot nl
  Target Milestone: ---

When a global class instance is const and initialized using constant arguments
to a constexpr constructor, any member references are optimized away (using the
constant value rather than a lookup). However, when the object is *not* const,
but does have const members, such optimization does not happen.

$ cat test2.cpp; gcc -O3 -S -Wall -Wextra -fdump-tree-optimized=/dev/stdout
test2.cpp
constexpr int v = 1;

struct Test {
  constexpr Test(int v, const int *p) : v(v), p(p) { }
  int const v;
  const int * const p;
};

const Test constant_test(v, &v);
Test non_constant_test(v, &v);

int constant_ref() {
return constant_test.v + *constant_test.p;
}

int non_constant_ref() {
return non_constant_test.v + *non_constant_test.p;
}

;; Function constant_ref (_Z12constant_refv, funcdef_no=3, decl_uid=2360,
cgraph_uid=4, symbol_order=6)

constant_ref ()
{
   [local count: 1073741824]:
  return 2;

}



;; Function non_constant_ref (_Z16non_constant_refv, funcdef_no=4,
decl_uid=2362, cgraph_uid=5, symbol_order=7)

non_constant_ref ()
{
  int _1;
  const int * _2;
  int _3;
  int _5;

   [local count: 1073741824]:
  _1 = non_constant_test.v;
  _2 = non_constant_test.p;
  _3 = *_2;
  _5 = _1 + _3;
  return _5;

}


In the constant_f() case, the values are completely optimized and the return
value is determined at compiletime. In the non_constant_f() case, the values
are retrieved at runtime.

However, AFAICS there should be no way that these values can be modified at
runtime, even when the object itself is not const, since the members are const.
So AFAICS, it shoul be possible to evaluation non_constant_f() at compiletime
as well.

Looking at the C++ spec (I'm quoting from the C++14 draft here), this would
seem to be possible as well.

[basic.type.qualifier] says "A const object is an object of type const T or a
non-mutable subobject of such an object."

If I read [intro.object] correctly, subobjects (such as non-static member
variables) are also objects, so a non-static member variable declared const
would be a "const object".

[dcl.type.cv] says "Except that any class member declared mutable (7.1.1) can
be modified, any attempt to modify a const object during its lifetime (3.8)
results in undefined behavior."

So, one can assume that the const member variable is not modified, because if
it was, that would be undefined behavior.

There is still the caveat of "during its lifetime", IOW, what if you would
destroy the object and create a new one it is place. However, see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80794#c5 for a discussion of this
case. In short, replacing non_constant_test with a new object is possible, but
only when it "does not contain any non-static data member whose type is
const-qualified or a reference type", which does not hold for this object. This
sounds like this provision was made for pretty much this case, even.

I suspect that reason that it works for the const object now, is because of the
rules for constant expressions. [expr.const] defines rules for constant
exprssions and seems to only allow using(through lvalue-to-rvalue conversion)
objects of non-integral types when they are constexpr. I can imagine that gcc
derives that constant_test might be effectively constexpr, making any
expressions that use it also effectively constant expressions. This same
derivation probably does not happen for subobjects (I guess "constexpr" is not
actually a concept that applies to subobjects at all). However, I think this
does not mean this optimization would be invalid, just that it would happen on
different grounds than the current optimization.


This issue is also related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80794, but AFAICS that is more
about partial compilation and making assumptions about what an external
function can or cannot do, while this issue is primarily about link-time
(though maybe they are more similar internally, I don't exactly know).

I believe that this optimization would be quite significant to make, since it
allows better abstraction and separation of concerns (i.e. it allows writing a
class to be generic, using constructor-supplied parameters, but if you pass
constants for these parameters and have just a single instance of such a class,
or when methods are inlined or constprop'd, there could be zero runtime
overhead for this extra abstraction). Currently, I believe that you either have
to accept runtime overhead, or resort to usin

[Bug c++/96996] Missed optimzation for constant members of non-constant objects

2020-09-09 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96996

--- Comment #2 from Matthijs Kooijman  ---
> Replacing non_constant_test with a new object is possible, and allowed. But 
> the name "non_constant_test" cannot be used to refer to the new object, so 
> any calls to non_constant_ref() after the object was replaced would have 
> undefined behaviour. Which means the compiler can assume there are no such 
> calls.

Thanks for clarifying. But then I could reason that *if* "non_constant_test" is
replaced, then accessing it through the old name is undefined behavior, so that
would make any value for the constant member variables (such as the original
values before replacement) acceptable, right? Hence that does not conflict with
applying this optimzation, I'd think.

[Bug c++/96996] Missed optimzation for constant members of non-constant objects

2020-09-10 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96996

--- Comment #5 from Matthijs Kooijman  ---
> But isn't there const_cast<> to change the value of p?

Yes, that makes it possible to write to a const object, but actually doing so
is undefined behavior (see [dcl.type.cv] I quoted above).

The spec even makes this explicit about const_cast, [expr.const.cast] says:

> [ Note: Depending on the type of the object, a write operation through
> the pointer, lvalue or pointer to data member resulting from a
> const_cast that casts away a const-qualifier may produce undefined
> behavior (7.1.6.1). — end note ]

[Bug tree-optimization/80794] constant objects can be assumed to be immutable

2020-09-14 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80794

--- Comment #10 from Matthijs Kooijman  ---
Also note that pr96996, that was marked as a duplicate of this report, talks
about a notable subcase of the case presented in this report. While this report
talks about constant complete objects (e.g. a variable marked const), pr96996
talks about const subobjects (e.g. a const member of a variable that might not
be const itself).

That pr has some motivation to argue that such const subobjects can be
optimized in the same way as const complete objects. Maybe this is obvious for
seasoned gcc devs, but I wanted to point it out regardless, since the cases
seem distinct enough to me that one might end up fixing this for objects and
forgetting about subobjects :-)

[Bug preprocessor/80753] __has_include and __has_include_next taints subsequent I/O errors

2020-09-23 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80753

Matthijs Kooijman  changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #3 from Matthijs Kooijman  ---
I also ran into this, this is still a problem in gcc 10.0.1.

This is a particular problem in the Arduino environment, which relies on
preprocessor error messages to automatically pick libraries to include. This
bug prevents it from detecting that a particular library is needed and from
adding it to the include path, breaking the build (that could have worked
without this bug).

[Bug preprocessor/80753] __has_include and __has_include_next taints subsequent I/O errors

2020-09-23 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80753

--- Comment #4 from Matthijs Kooijman  ---
I looked a bit at the source, and it seems the the problem is (not
surprisingly) that `__has_include` causes the header filename to be put into
the cache, and an error message is only generated when the entire include path
has been processed without resolving the entry from the cache (essentially this
means that an error is only triggered when the entry is put into the cache,
*except* when it happens as part of `__has_include`).

Relevant function is _cpp_find_file:

https://github.com/gcc-mirror/gcc/blob/da13b7737662da11f8fefb28eaf4ed7c50c51767/libcpp/files.c#L506

This is called with kind = _cpp_FFK_HAS_INCLUDE for `__has_include` which
prevents an error here:

https://github.com/gcc-mirror/gcc/blob/da13b7737662da11f8fefb28eaf4ed7c50c51767/libcpp/files.c#L591-L592

And the cache entry is immediately returned on subsequent calls here:

https://github.com/gcc-mirror/gcc/blob/da13b7737662da11f8fefb28eaf4ed7c50c51767/libcpp/files.c#L523-L526

It seems there are continuous lookups in the cache and it took me a while to
realize how the cache actually works (I initially thought that maybe files that
were *not* found) were not actually put in the cache, but AFAIU the
`pfile->file_hash` cache works like this:
 - Hash slots are indexed by filename
 - Each slot contains a linked list of entries.
 - Each entry contains the directory lookups start from. This can be the
start_dir, but also the first quote_include or bracket_include dir. Iow, the
cache entry is only valid for a lookup that starts at that directory, or has
progressed to that directory, to allow a "" include to prime the cache for a <>
include as well.
 - Once a file is found, or the search path is exhausted, the result is stored
in the cache for start_dir and for "" and <> includes if the start dir for
those has been passed.

Anyway, this means that a failed lookup based from __has_include() will cause
the failed result to be put into the cache for one or more directories without
emitting an error and always be returned for subsequent includes without
emitting an error.


An obvious fix would be to simply not put the result in the cache for
_cpp_FFK_HAS_INCLUDE, but that would be a missed cache opportunity.

An alternative would be to add a boolean "error_emitted" to each _cpp_file* in
the cache (cannot add it to the cache entry itself, since the same _cpp_file*
might end up in different cache entries), that defaults to false and is set to
true by open_file_failed. When kind is not _cpp_FFK_HAS_INCLUDE, and returning
a cache entry that has "error_emitted" set to false and has an errno call
open_file_failed to emit an error. This would require that open_file_failed to
emit the right output in this case (i.e. the cached _cpp_file* should not rely
on the context in which it was generated), but AFAICS this would be the case.

I'm not quite familiar with building and patching gcc (and also really need to
get this yak hair back to my actual work), so I probably won't be providing a
patch here. Maybe my above analysis enables someone else to do so? :-)

[Bug c/39589] make -Wmissing-field-initializers=2 work with "designated initializers" ?

2019-11-22 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39589

Matthijs Kooijman  changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #11 from Matthijs Kooijman  ---
It seems this was actually implemented at some point (at least for C++, maybe
that was the case all along already), though the manual page was not updated to
reflect this. Taking the example from the manual (which is documented to *not*
cause this warning):

matthijs@grubby:~$ cat foo.cpp 
struct s { int f, g, h; };
struct s x = { .f = 3, .g = 4 };
matthijs@grubby:~$ gcc foo.cpp -c  -Wall -Wextra
foo.cpp:2:31: warning: missing initializer for member ‘s::h’
[-Wmissing-field-initializers]
 struct s x = { .f = 3, .g = 4 };
   ^
However, this seems to be the case only for C++, if I rename to foo.c, no
warning is emitted.


I actually came here looking for a way to *disable* this warning for designated
initializers on a specific struct. I was hoping to use a struct with designated
initializers as an elegant way to specify configuration with optional fields
(e.g. by lettin any unspecified fields be initialized to 0 and fill in a
default value for them). However, when any caller that omits a field to get the
default value is pestered with a warning, that approach does not really work
well. On the other hand, disabling the warning completely with a commandline
option or pragma seems heavy-handed, since I do consider this a useful warning
in many other cases.

[Bug lto/83967] LTO removes C functions declared as weak in assembler(depending on files order in linking)

2019-11-26 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83967

Matthijs Kooijman  changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #14 from Matthijs Kooijman  ---
I actually think this is a different problem from the fixed
https://sourceware.org/bugzilla/show_bug.cgi?id=22502. Using gcc 8.2.1 and
binutils 2.31.51.20181213 from the STM32 Arduino core
(https://github.com/stm32duino/Arduino_Core_STM32), I can still reproduce this
problem using the example from comment 11 (and also in an actual implementation
using stm32duino). I also tested the example from the linked bug, which *is*
indeed fixed, leading me to believe this is a different problem (or the fix is
not complete yet).

The example from this bug is a lot bigger than the one from 22502, so there is
probably something in here that triggers this. Maybe that the weak
implementation is defined in assembly rather than C?

[Bug target/92693] New: Inconsistency between __UINTPTR_TYPE__ and __UINT32_TYPE__ on ARM

2019-11-27 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92693

Bug ID: 92693
   Summary: Inconsistency between __UINTPTR_TYPE__ and
__UINT32_TYPE__ on ARM
   Product: gcc
   Version: 7.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthijs at stdin dot nl
  Target Milestone: ---

Gcc defines a number of macros for types, which are used by stdint.h to define
the corresponding typedefs. In particular, I'm looking at uintptr_t. On ARM,
this is 32-bits and equals unsigned int:

#define __UINTPTR_TYPE__ unsigned int

In my code, I was running into problems trying to pass a uintptr_t to a
function that has overloads for uint8_t, uint16_t and uint32_t (ambigious
function call). Investigating, it turns out that uint32_t is defined as long
unsigned int:

#define __UINT32_TYPE__ long unsigned int

I would expect that, since both types are 32-bit long, they would actually
resolve to the same type. This would also make overload resolution work as
expected. Is there any reason for this inconsistency, or could it be fixed?

To test this, I installed the gcc-arm-none-eabi, version 15:7-2018-q2-6 from
Ubuntu Disco (same version should be in Debian testing):

$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (15:7-2018-q2-6) 7.3.1 20180622 (release)
[ARM/embedded-7-branch revision 261907]
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ arm-none-eabi-gcc -dM -E -x c++ /dev/null |egrep '(UINTPTR_TYPE|UINT32_TYPE)'
#define __UINT32_TYPE__ long unsigned int
#define __UINTPTR_TYPE__ unsigned int

I see the same problem using gcc 8.2.1 shipped with the STM32 arduino core
(https://github.com/stm32duino/Arduino_Core_STM32).

To illustrate the original problem I was seeing, here's a small testcase:

$ cat foo.cpp
#include 

void func(uint16_t);
void func(uint32_t);

int main() {
func((uintptr_t)nullptr);
static_assert(sizeof(uintptr_t) == sizeof(uint32_t), "Sizes not
equal");
}

$ arm-none-eabi-gcc -c foo.cpp 
foo.cpp: In function 'int main()':
foo.cpp:7:25: error: call of overloaded 'func(uintptr_t)' is ambiguous
  func((uintptr_t)nullptr);
 ^
foo.cpp:3:6: note: candidate: void func(uint16_t)
 void func(uint16_t);
  ^~~~
foo.cpp:4:6: note: candidate: void func(uint32_t)
 void func(uint32_t);
  ^~~~

[Bug target/92693] Inconsistency between __UINTPTR_TYPE__ and __UINT32_TYPE__ on ARM

2019-11-27 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92693

--- Comment #3 from Matthijs Kooijman  ---
> I don't see why you should expect that, there's nothing in the standards 
> suggesting it should be the case.

This is true, current behaviour is standards-compliant AFAICS. However, I
expect that because it would be consistent, and would make things behave with
least surprise (at least for the usecase I suggested).

> Changing it would be an ABI change, so seems like a bad idea.

Good point.

I did a bit more searching and found this Linux kernel patch. The commit
message suggests that it might at some point have been consistent:

https://patchwork.kernel.org/patch/2845139/

I assume that "bare metal GCC" would refer to the __xxx_TYPE__ macros, or at
least whatever you get when you include .

> N.B. you get exactly the same overload failure if you call func(1u). The 
> problem is your overload set, not the definition of uintptr_t.

Fair point, though I think that it is hard to define a proper overload set
here. In my case, I'm defining functions to print various sizes of integers.
Because the body of the function needs to know how big the type is, I'm using
the 
uintxx_t types to define them. I could of course define the function for
(unsigned) char, short, int, long, long long, but then I can't make any
assumptions about the exact size of each (I could use sizeof and make a generic
implementation, but I wanted to keep things simple and use a different
implementation for each size).

I guess this might boil down to C/C++ being annoying when it comes to integer
types, and not something GCC can really fix (though it *would* have been more
convenient if this had been consistent from the start).

Feel free to close if that seems appropriate.

[Bug tree-optimization/93359] New: Miscompile (loop check omitted) in function with missing return statement

2020-01-21 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93359

Bug ID: 93359
   Summary: Miscompile (loop check omitted) in function with
missing return statement
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthijs at stdin dot nl
  Target Milestone: ---

I came across a miscompile, where a missing return statement in a function
resulted in a simple for loop never terminating. I originally found this in an
STM32 ARM Arduino environment, but managed to reduce this to just a few lines
of code and found that it also occurs on x86_64.

Here's the testcase:

   matthijs@grubby:~$ cat foo.cpp
   #include 

   volatile uint8_t *REG = 0x0;

   uint8_t foo() {
 for (int i = 0; i < 16; i++)
   *REG = i;
   }

   int main() {
 foo();
   }

I used a volatile "register" write here just to have something in the
loop that will not be optimized away, I originally had actual code in
there.

Compiling this results in a loop that never terminates:

   matthijs@grubby:~$ gcc-8 -Os -c  foo.cpp
   foo.cpp: In function ‘uint8_t foo()’:
   foo.cpp:8:1: warning: no return statement in function returning non-void
[-Wreturn-type]
}
^
   matthijs@grubby:~$ objdump -S foo.o

   foo.o: file format elf64-x86-64


   Disassembly of section .text:

    <_Z3foov>:
  0:   31 c0   xor%eax,%eax
  2:   48 8b 15 00 00 00 00mov0x0(%rip),%rdx# 9
<_Z3foov+0x9>
  9:   88 02   mov%al,(%rdx)
  b:   ff c0   inc%eax
  d:   eb f3   jmp2 <_Z3foov+0x2>

   Disassembly of section .text.startup:

    :
  0:   e8 00 00 00 00  callq  5 

I get identical output on gcc 9, but gcc 7 produces code as expected:

   matthijs@grubby:~$ gcc-7 -Os -c  foo.cpp
   matthijs@grubby:~$ objdump -S foo.o

   foo.o: file format elf64-x86-64


   Disassembly of section .text:

    <_Z3foov>:
  0:   31 c0   xor%eax,%eax
  2:   48 8b 15 00 00 00 00mov0x0(%rip),%rdx# 9
<_Z3foov+0x9>
  9:   88 02   mov%al,(%rdx)
  b:   ff c0   inc%eax
  d:   83 f8 10cmp$0x10,%eax
 10:   75 f0   jne2 <_Z3foov+0x2>
 12:   c3  retq

   Disassembly of section .text.startup:

    :
  0:   e8 00 00 00 00  callq  5 
  5:   31 c0   xor%eax,%eax
  7:   c3  retq

Also, running with -O0 produces working code:

   matthijs@grubby:~$ gcc-8 -O0 -c  foo.cpp
   foo.cpp: In function ‘uint8_t foo()’:
   foo.cpp:8:1: warning: no return statement in function returning non-void
[-Wreturn-type]
}
^
   matthijs@grubby:~$ objdump -S foo.o 

   foo.o: file format elf64-x86-64


   Disassembly of section .text:

    <_Z3foov>:
  0:   55  push   %rbp
  1:   48 89 e5mov%rsp,%rbp
  4:   c7 45 fc 00 00 00 00movl   $0x0,-0x4(%rbp)
  b:   83 7d fc 0f cmpl   $0xf,-0x4(%rbp)
  f:   7f 12   jg 23 <_Z3foov+0x23>
 11:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 18
<_Z3foov+0x18>
 18:   8b 55 fcmov-0x4(%rbp),%edx
 1b:   88 10   mov%dl,(%rax)
 1d:   83 45 fc 01 addl   $0x1,-0x4(%rbp)
 21:   eb e8   jmpb <_Z3foov+0xb>
 23:   90  nop
 24:   5d  pop%rbp
 25:   c3  retq   

   0026 :
 26:   55  push   %rbp
 27:   48 89 e5mov%rsp,%rbp
 2a:   e8 00 00 00 00  callq  2f 
 2f:   b8 00 00 00 00  mov$0x0,%eax
 34:   5d  pop%rbp
 35:   c3  retq   

This seems C++-specific, when I rename foo.cpp to foo.c and compiler, it
produces output as expected.

Here's the compiler versions I used, these are just plain Ubuntu x86_64
version:

   matthijs@grubby:~$ gcc-7 -v
   Using built-in specs.
   COLLECT_GCC=gcc-7
   COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
   OFFLOAD_TARGET_NAMES=nvptx-none
   OFFLOAD_TARGET_DEFAULT=1
   Target: x86_64-linux-gnu
   Configured with: ../src/configure -v --with-pkgversion='Ubuntu
7.4.0-8ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-7
--program-prefix=x86_64-linux-gnu- --ena

[Bug target/56533] New: Linker problem on avr with lto and main function inside archive

2013-03-05 Thread matthijs at stdin dot nl


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56533



 Bug #: 56533

   Summary: Linker problem on avr with lto and main function

inside archive

Classification: Unclassified

   Product: gcc

   Version: 4.7.2

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: target

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: matth...@stdin.nl





When trying to add lto to my Arduino program, it stopped compiling complaining

about missing symbols. I've managed to reduce the problem to below minimal

example. Note that removing anything from below example makes the problem

disappear. In particular, the problem disappears when:

 * any of the linker options is removed: -mmcu=atmega328p -Os -flto

-fwhole-program

 * the -flto compiler option is removed

 * using normal gcc (amd64) instead of avr-gcc

 * linking main.o instead of main.a

 * declaring realmain as externally_visible in realmain.c



Note that in this example, the actual main() function is inside an archive,

which is probably the reason for this bug / problem.







$ avr-gcc --version

avr-gcc (GCC) 4.7.2

Copyright (C) 2012 Free Software Foundation, Inc.

This is free software; see the source for copying conditions.  There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



$ avr-ld --version

GNU ld (GNU Binutils) 2.20.1.20100303

Copyright 2009 Free Software Foundation, Inc.

This program is free software; you may redistribute it under the terms of

the GNU General Public License version 3 or (at your option) a later

version.

This program has absolutely no warranty.



$ cat main.c

int realmain(void);



int main(void)

{

return realmain();

}



$ cat realmain.c

int realmain(void) {

}



$ cat do

#!/bin/sh

set -x



rm -f main.a



/usr/bin/avr-gcc -c main.c -o main.o

/usr/bin/avr-ar rcs main.a  main.o

/usr/bin/avr-gcc -c -flto realmain.c -o realmain.o

/usr/bin/avr-gcc -mmcu=atmega328p -Os -flto -fwhole-program realmain.o

main.a



$ ./do

+ rm -f main.a

+ /usr/bin/avr-gcc -c main.c -o main.o

+ /usr/bin/avr-ar rcs main.a main.o

+ /usr/bin/avr-gcc -c -flto realmain.c -o realmain.o

+ /usr/bin/avr-gcc -mmcu=atmega328p -Os -flto -fwhole-program realmain.o

main.a

main.a(main.o): In function `main':

main.c:(.text+0x8): undefined reference to `realmain'

collect2: error: ld returned 1 exit status


[Bug target/56533] Linker problem on avr with lto and main function inside archive

2013-03-05 Thread matthijs at stdin dot nl


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56533



--- Comment #2 from Matthijs Kooijman  2013-03-05 
12:55:47 UTC ---

+ /usr/bin/avr-gcc -v -mmcu=atmega328p -Os -flto -fwhole-program realmain.o

main.a

Using built-in specs.

COLLECT_GCC=/usr/bin/avr-gcc

COLLECT_LTO_WRAPPER=/usr/lib/gcc/avr/4.7.2/lto-wrapper

Target: avr

Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr/lib

--infodir=/usr/share/info --mandir=/usr/share/man --bindir=/usr/bin

--libexecdir=/usr/lib --libdir=/usr/lib --enable-shared --with-system-zlib

--enable-long-long --enable-nls --without-included-gettext --disable-libssp

--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=avr

Thread model: single

gcc version 4.7.2 (GCC) 

COMPILER_PATH=/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/:/usr/lib/gcc/avr/4.7.2/../../../avr/bin/

LIBRARY_PATH=/usr/lib/gcc/avr/4.7.2/avr5/:/usr/lib/gcc/avr/4.7.2/../../../avr/lib/avr5/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/4.7.2/../../../avr/lib/

COLLECT_GCC_OPTIONS='-v' '-mmcu=atmega328p' '-Os' '-flto' '-fwhole-program'

 /usr/lib/gcc/avr/4.7.2/collect2 -flto -m avr5 -Tdata 0x800100

/usr/lib/gcc/avr/4.7.2/../../../avr/lib/avr5/crtm328p.o

-L/usr/lib/gcc/avr/4.7.2/avr5 -L/usr/lib/gcc/avr/4.7.2/../../../avr/lib/avr5

-L/usr/lib/gcc/avr/4.7.2 -L/usr/lib/gcc/avr/4.7.2/../../../avr/lib realmain.o

main.a -lgcc -lc -lgcc

 /usr/bin/avr-gcc @/tmp/ccYrSTvi.args

Using built-in specs.

COLLECT_GCC=/usr/bin/avr-gcc

Target: avr

Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr/lib

--infodir=/usr/share/info --mandir=/usr/share/man --bindir=/usr/bin

--libexecdir=/usr/lib --libdir=/usr/lib --enable-shared --with-system-zlib

--enable-long-long --enable-nls --without-included-gettext --disable-libssp

--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=avr

Thread model: single

gcc version 4.7.2 (GCC) 

COLLECT_GCC_OPTIONS='-c' '-v' '-mmcu=atmega328p' '-Os' '-fwhole-program'

'-fltrans-output-list=/tmp/ccZEu3t3.ltrans.out' '-fwpa'

 /usr/lib/gcc/avr/4.7.2/lto1 -quiet -dumpbase realmain.o -mmcu=atmega328p

-auxbase realmain -Os -version -fwhole-program

-fltrans-output-list=/tmp/ccZEu3t3.ltrans.out -fwpa @/tmp/ccOsOe32

GNU GIMPLE (GCC) version 4.7.2 (avr)

compiled by GNU C version 4.7.2, GMP version 5.0.5, MPFR version

3.1.0-p10, MPC version 0.9

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072

GNU GIMPLE (GCC) version 4.7.2 (avr)

compiled by GNU C version 4.7.2, GMP version 5.0.5, MPFR version

3.1.0-p10, MPC version 0.9

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072

COMPILER_PATH=/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/:/usr/lib/gcc/avr/4.7.2/../../../avr/bin/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/:/usr/lib/gcc/avr/4.7.2/../../../avr/bin/

LIBRARY_PATH=/usr/lib/gcc/avr/4.7.2/avr5/:/usr/lib/gcc/avr/4.7.2/../../../avr/lib/avr5/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/4.7.2/../../../avr/lib/

COLLECT_GCC_OPTIONS='-c' '-v' '-mmcu=atmega328p' '-Os' '-fwhole-program'

'-fltrans-output-list=/tmp/ccZEu3t3.ltrans.out' '-fwpa'

 /usr/bin/avr-gcc @/tmp/ccoysJBM.args

Using built-in specs.

COLLECT_GCC=/usr/bin/avr-gcc

Target: avr

Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr/lib

--infodir=/usr/share/info --mandir=/usr/share/man --bindir=/usr/bin

--libexecdir=/usr/lib --libdir=/usr/lib --enable-shared --with-system-zlib

--enable-long-long --enable-nls --without-included-gettext --disable-libssp

--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=avr

Thread model: single

gcc version 4.7.2 (GCC) 

COLLECT_GCC_OPTIONS='-c' '-v' '-mmcu=atmega328p' '-Os' '-fwhole-program'

'-fltrans-output-list=/tmp/ccZEu3t3.ltrans.out' '-fltrans' '-o'

'/tmp/ccZEu3t3.ltrans0.ltrans.o'

 /usr/lib/gcc/avr/4.7.2/lto1 -quiet -dumpbase ccZEu3t3.ltrans0.o

-mmcu=atmega328p -auxbase-strip /tmp/ccZEu3t3.ltrans0.ltrans.o -Os -version

-fwhole-program -fltrans-output-list=/tmp/ccZEu3t3.ltrans.out -fltrans

@/tmp/ccAudUT3 -o /tmp/ccyDScYi.s

GNU GIMPLE (GCC) version 4.7.2 (avr)

compiled by GNU C version 4.7.2, GMP version 5.0.5, MPFR version

3.1.0-p10, MPC version 0.9

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072

GNU GIMPLE (GCC) version 4.7.2 (avr)

compiled by GNU C version 4.7.2, GMP version 5.0.5, MPFR version

3.1.0-p10, MPC version 0.9

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072

COLLECT_GCC_OPTIONS='-c' '-v' '-mmcu=atmega328p' '-Os' '-fwhole-program'

'-fltrans-output-list=/tmp/ccZEu3t3.ltrans.out' '-fltrans' '-o'

'/tmp/ccZEu3t3.ltrans0.ltrans.o'

 /usr/lib/gcc/avr/4.7.2/../../../avr/bin/as -mmcu=atmega328p -mno-skip-bug -o

/tmp/ccZEu3t3.ltrans0.ltrans.o /tmp/ccyDScYi.s

COMPILER_PAT

[Bug target/56533] Linker problem on avr with lto and main function inside archive

2013-03-05 Thread matthijs at stdin dot nl


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56533



--- Comment #3 from Matthijs Kooijman  2013-03-05 
13:06:18 UTC ---

Seems I made a wrong observation in my original report: When I link main.o

instead of main.a, the problem does _not_ go away. In fact, I can remove a few

more flags then, while still keeping the problem around:



$ ./do

+ rm -f main.a main.o realmain.o

+ /usr/bin/avr-gcc -c main.c -o main.o

+ /usr/bin/avr-gcc -c -flto realmain.c -o realmain.o

+ /usr/bin/avr-gcc -flto -fwhole-program realmain.o main.o

main.o: In function `main':

main.c:(.text+0x8): undefined reference to `realmain'

collect2: error: ld returned 1 exit status



main.c and realmain.c are the same as before.



However, adding -flto to the main.c compilation makes the problem disappear

again. I suspect that this means that without -flto, main.o is passed straight

to the linker and with -flto it is included in link-time optimization, which

would mean your previous analysis still holds.



$ ./do

+ rm -f *.a main.o realmain.o

+ /usr/bin/avr-gcc -c -flto main.c -o main.o

+ /usr/bin/avr-gcc -c -flto realmain.c -o realmain.o

+ /usr/bin/avr-gcc -flto -fwhole-program realmain.o main.o


[Bug target/56533] Linker problem on avr with lto and main function inside archive

2013-03-05 Thread matthijs at stdin dot nl


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56533



Matthijs Kooijman  changed:



   What|Removed |Added



 Status|RESOLVED|UNCONFIRMED

 Resolution|INVALID |



--- Comment #5 from Matthijs Kooijman  2013-03-05 
14:38:36 UTC ---

Just for future reference, the problem here seems to be that I'm using

-fwhole-program, but the GCC LTO cannot actually look at the whole program. In

particular, .a archives and .o object files that were compiled without -flto,

are passed directly to the linker and not included in LTO. Since

-fwhole-program makes the compiler assume that all files that are included in

LTO compose the whole program, the compiler removes symbols that look unused,

but then turn up missing in the final link.



So, I shouldn't have been using -fwhole-program, or I should be aware of the

above and set externally_visible attributes as needed if I insist on using

-fwhole-program.



Ideally, the compiler would ask the linker about which symbols are used in

these "non-LTO" objects, which is done by -fuse-linker-plugin (which is implied

by -flto). However, on the AVR target, it seems there is no linker plugin (at

least not in this particular case), which means that without -fwhole-program,

the compiler cannot optimize as much (since it has to assume that all symbols

are externally visible).


[Bug target/56533] Linker problem on avr with lto and main function inside archive

2013-03-05 Thread matthijs at stdin dot nl


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56533



Matthijs Kooijman  changed:



   What|Removed |Added



 Status|UNCONFIRMED |RESOLVED

 Resolution||INVALID



--- Comment #6 from Matthijs Kooijman  2013-03-05 
14:40:15 UTC ---

w00ps, didn't mean to change the resolution.


[Bug rtl-optimization/51447] [4.7 Regression] global register variable definition incorrectly removed as dead code

2013-09-10 Thread matthijs at stdin dot nl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51447

Matthijs Kooijman  changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #19 from Matthijs Kooijman  ---
In case anyone else comes across here and wonders: This fix made it into 4.8,
but was not backported into 4.7.3.

Regarding the bug description that says "4.7 regression", I have also observed
this bug on avr-gcc 4.3.3, so it's not a regression introduced in 4.7.

I also noticed this bug on the AVR platform, using 4.7.2. Just in case it helps
(perhaps for others to find this bug when Googling for avr-gcc), here's the
testcase and bugreport I was preparing before I found this one.

  //
  // Under some circumstances, writes to a global register variable are
  // optimized away, even though that changes behaviour. The below example
  // illustrates this.
  //
  // When compiled as-is, the writes to the variable "global" are removed.
  // However, when compiling with -DDO_CALL, which adds a function call to
  // the main function, the writes are preserved. This leads me to believe
  // that the optimizer sees that main() isn't calling any functions, so
  // it must be safe to just remove the writes (even though documentation
  // [1] says "Stores into this register are never deleted even if they
  // appear to be dead, but references may be deleted or moved or
  // simplified.")
  //
  // It seems that a second condition (in addition to no functions called)
  // is that the main function does not return. If we add a return path,
  // the writes show up again.
  //
  // However, removing these writes does not seem sane, since there is
  // also an interrupt routine, which can access the variable, but the
  // optimizer is apparently not aware that this is a possibility.
  //
  //
  // Tested using:
  // avr-gcc -mmcu=attiny13 register.c -S -o - -O
  // avr-gcc -mmcu=attiny13 register.c -S -o - -O -DDO_CALL
  // avr-gcc -mmcu=attiny13 register.c -S -o - -O -DDO_RETURN
  //
  // [1]: //
http://gcc.gnu.org/onlinedocs/gcc/Global-Reg-Vars.html#Global-Reg-Vars

  #include "avr/io.h"
  #include "avr/cpufunc.h"

  // Define a global variable in a register
  register char global asm("r16");

  // Just a dummy function
  void foo()
  {
  // Add some nops so this function doesn't get inlined
  _NOP(); _NOP(); _NOP();
  }

  // Define an ISR that accesses the global. This doesn't actually seem to
  // make a different, except that if this wasn't here, removing writes to
  // the global would be acceptable
  void ISR(INT0_vect)
  {
  PORTB = global;
  }

  void main()
  {
  global = 1;
  while(1) {
  #ifdef DO_CALL
  foo();
  #endif
  #ifdef DO_RETURN
  return;
  #endif
  }
  }


[Bug c++/78609] invalid member's visibility detection in constexpr

2018-01-11 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78609

Matthijs Kooijman  changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #2 from Matthijs Kooijman  ---
I also ran into this problem. It seems that gcc somehow inlines c_str() (or
rather, evaluates the constexpr variable it is assigned to) before visibility
checks (possibly because the constexpr is evaluated before template
initialization?).

Below is a smaller example, which is confirmed broken up to gcc 8. I could not
reduce this example any further, so it seems the essential pattern that
triggers this is:
 - There is a class instance in a constexpr variable with static storage
duration
 - A pointer to a private member of this object is accessed through a method
 - This pointer is assigned to a constexpr variable
 - This pointer is assigned in a template instantiation

Here's the code:

  class foo {
  char c;
public:
  //constexpr foo(): c(1) { }
  //constexpr foo(char c): c(c) { }
  constexpr const char* c_str() const { return &c; }
  };

  constexpr foo basename = foo(); // Fails
  // These also fail, if you add the appropriate constructor above
  //static constexpr foo basename = foo(1); // Fails
  //static constexpr foo basename(1); // Fails
  //static constexpr foo basename{1}; // Fails
  //static constexpr foo basename{}; // Fails
  // Surprisingly this works (but needs a constructor above):
  //static constexpr foo basename; // Works

  template 
  void call() {
// This is where it breaks
constexpr const char *unused = basename.c_str();
  }

  int main() {
// Instantiate the call function
call();
  }

  // Removing the template argument on T makes it work
  // Letting T be deduced by adding an argument to call() also fails
  // Making the "unused" variable non-constexpr makes it work
  // Making get() return c instead of &c makes it work
  // Making "basename" a static variable inside call() also fails
  //
  // Tested on avr-gcc avr-gcc 4.9.2, gcc Debian 6.3.0-18, gcc Debian
  // 7.2.0-19, gcc Debian 8-20180110-1


$ avr-gcc ATest.cpp -std=c++11
ATest.cpp: In instantiation of ‘void call() [with T = int]’:
ATest.cpp:26:13:   required from here
ATest.cpp:2:10: error: ‘char foo::c’ is private
 char c;
  ^
ATest.cpp:21:49: error: within this context
   constexpr const char *unused = basename.c_str();
 ^

[Bug preprocessor/51259] no escape on control characters on linemarker lines

2016-05-19 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51259

Matthijs Kooijman  changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #2 from Matthijs Kooijman  ---
I just stumbled upon this same issue. The most direct course seems to be to fix
the documentation to match the implmentation.

In his comment Shakthi Kannan runs the gcc -E output through hexdump -c and
says that the octal value is present there, but that's just hexdump that shows
any non-printable characters in the file using on octal value. The raw byte is
still present in the gcc output.

To confirm, here's another testcase:

$ touch $'foo\001bar.cpp'
$ gcc -E foo^Abar.cpp
  # 1 "foobar.cpp"
  # 1 ""
  # 1 ""
  # 1 "/usr/include/stdc-predef.h" 1 3 4
  # 1 "" 2
  # 1 "foobar.cpp"
$ gcc -E foo^Abar.cpp |hd
  00  23 20 31 20 22 66 6f 6f  01 62 61 72 2e 63 70 70  |# 1 "foo.bar.cpp|
  10  22 0a 23 20 31 20 22 3c  62 75 69 6c 74 2d 69 6e  |".# 1 "".# 1 "".# 1 "/us|
  40  72 2f 69 6e 63 6c 75 64  65 2f 73 74 64 63 2d 70  |r/include/stdc-p|
  50  72 65 64 65 66 2e 68 22  20 31 20 33 20 34 0a 23  |redef.h" 1 3 4.#|
  60  20 31 20 22 3c 63 6f 6d  6d 61 6e 64 2d 6c 69 6e  | 1 "" 2.# 1 "foo.b|
  80  61 72 2e 63 70 70 22 0a   |ar.cpp".|
  88

(Note that my terminal seems to hide the control character in the direct
gcc output, but obviously no octal escape is present, and hexdump
confirms that the raw byte is present)

Looking at the code, you can see the line marker is generated here:
https://github.com/gcc-mirror/gcc/blob/edd716b6b1caa1a5cb320a8cd7f626f30198e098/gcc/c-family/c-ppoutput.c#L413-L415

And the escaping of the filename happens here:
https://github.com/gcc-mirror/gcc/blob/a588355ab948cf551bc9d2b89f18e5ae5140f52c/libcpp/macro.c#L491-L511

So only \ and " are escaped, nothing else.

[Bug c++/43745] [avr] g++ puts VTABLES in SRAM

2017-09-02 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43745

Matthijs Kooijman  changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #12 from Matthijs Kooijman  ---
Apologies if this is an obvious question, but I'm not familiar with gcc/g++
internals. Georg-Johann, you say this requires address space support in c++,
but I'm not sure I follow you there. Two things:
 - You say WG21 will never add AS support to C++, but also say that language
support for AS is not needed, only internal support in gcc/g++. So that means
what WG21 does is not relevant for vtable handling in particular?
 - Even if AS would not be used, what prevents g++ from emitting the vtables in
the `progmem.data` section and generating vtable-invocation code using LPM
instructions? This behaviour could be toggled using a commandline option, or
some gcc-specific attribute on a class?

I would be happy if you could comment on the feasibility of these two
approaches, thanks!

[Bug c++/43745] [avr] g++ puts VTABLES in SRAM

2017-09-04 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43745

--- Comment #14 from Matthijs Kooijman  ---
Thanks for the additional explanations!

[Bug other/60145] [AVR] Suboptimal code for byte order shuffling using shift and or

2016-11-28 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60145

--- Comment #3 from Matthijs Kooijman  ---
Thanks for digging into this :-D

I suppose you meant
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=242907 instead of the
commit you linked (which is also nice btw, I noticed that extra sbiw in some
places as well).

Looking at the generated assembly, the optimizations look like fairly simple
(composable) translations, but I assume that the optimization needs to happen
before/while the assembly is generated, not afterwards. And then I can see that
the patterns would indeed become complex.

My goal was indeed to compose values. Using a union is endian-dependent, which
is a downside.

If I understand your vector-example correctly, vectors are always stored as big
endian, so using this approach would be portable? I couldn't find anything
about this in the documentation, though.

[Bug target/77326] [avr] Invalid optimization omits comparison

2016-09-21 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77326

--- Comment #6 from Matthijs Kooijman  ---
Thanks!

[Bug target/60300] [avr] Suboptimal stack pointer manipulation for frame setup

2014-04-08 Thread matthijs at stdin dot nl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60300

--- Comment #3 from Matthijs Kooijman  ---
Hmm, I don't think the gcc sources support that. AFAICT, they attempt to just
find the shortest approach, without caring for speed. For example, look at
avr.c, around line 1265, where it says:

  / Use shortest method /

  emit_insn (get_sequence_length (sp_plus_insns)
 < get_sequence_length (fp_plus_insns)
 ? sp_plus_insns
 : fp_plus_insns);

https://github.com/mirrors/gcc/blob/c2e306f5efb32b7eed856a1844487cff09aa86ac/gcc/config/avr/avr.c#L1265-L1270


[Bug target/60300] [avr] Suboptimal stack pointer manipulation for frame setup

2014-05-12 Thread matthijs at stdin dot nl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60300

--- Comment #5 from Matthijs Kooijman  ---
Ah, then the comments are a bit misleading, yes. Wouldn't it make sense to put
this decision outside of avr_sp_immediate_operand, in the same area where the
decision between the two options is made? Might lead to a bit of duplication,
though, it seems the function is callled twice.

In any case, from a user perspective, it surprises me that this exception is
made, even when compiling with -Os. Wouldn't it make sense to ignore the range
check with -Os? Or is -Os really only used to determine the list of
optimizations to (not) run and not supposed to influence the behaviour of the
compiler otherwise?


[Bug other/60145] New: [AVR] Suboptimal code for byte order shuffling using shift and or

2014-02-11 Thread matthijs at stdin dot nl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60145

Bug ID: 60145
   Summary: [AVR] Suboptimal code for byte order shuffling using
shift and or
   Product: gcc
   Version: 4.8.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthijs at stdin dot nl

(Not sure what the component should be, just selected "other" for now)

Using shifts and bitwise-or to compose multiple bytes into a bigger integer
results in suboptimal code on AVR.

For example, a few simple functions that take two or four bytes and
compose them into (big endian) integers. Since AVR is an 8-bit platform,
this essentially just means moving two bytes from the argument register
to the return value registers. However, the outputted assembly is
significantly bigger than that and contains obvious optimization
opportunities.

The example below also contains a version that uses a union to compose the
integer, which gets optimized as expected (but only works on little-endian
systems, since it relies on the native endianness of uint16_t).

matthijs@grubby:~$ cat foo.c
#include 

uint16_t join2(uint8_t a, uint8_t b) {
return ((uint16_t)a << 8) | b;
}

uint16_t join2_efficient(uint8_t a, uint8_t b) {
union {
uint16_t uint;
uint8_t arr[2];
} tmp = {.arr = {b, a}};
return tmp.uint;
}

uint32_t join4(uint8_t a, uint8_t b, uint8_t c, uint8_t d) {
return ((uint32_t)a << 24) | ((uint32_t)b << 16) | ((uint32_t)c << 8) |
d;
}
matthijs@grubby:~$ avr-gcc -c foo.c -O3 && avr-objdump -d foo.o

foo.o: file format elf32-avr


Disassembly of section .text:

 :
   0:   70 e0   ldi r23, 0x00   ; 0
   2:   26 2f   mov r18, r22
   4:   37 2f   mov r19, r23
   6:   38 2b   or  r19, r24
   8:   82 2f   mov r24, r18
   a:   93 2f   mov r25, r19
   c:   08 95   ret

000e :
   e:   98 2f   mov r25, r24
  10:   86 2f   mov r24, r22
  12:   08 95   ret

0014 :
  14:   0f 93   pushr16
  16:   1f 93   pushr17
  18:   02 2f   mov r16, r18
  1a:   10 e0   ldi r17, 0x00   ; 0
  1c:   20 e0   ldi r18, 0x00   ; 0
  1e:   30 e0   ldi r19, 0x00   ; 0
  20:   14 2b   or  r17, r20
  22:   26 2b   or  r18, r22
  24:   38 2b   or  r19, r24
  26:   93 2f   mov r25, r19
  28:   82 2f   mov r24, r18
  2a:   71 2f   mov r23, r17
  2c:   60 2f   mov r22, r16
  2e:   1f 91   pop r17
  30:   0f 91   pop r16
  32:   08 95   ret


[Bug target/60300] New: [avr] Suboptimal stack pointer manipulation for frame setup

2014-02-21 Thread matthijs at stdin dot nl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60300

Bug ID: 60300
   Summary: [avr] Suboptimal stack pointer manipulation for frame
setup
   Product: gcc
   Version: 4.8.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthijs at stdin dot nl

For setting up the stack frame in the function prologue, gcc chooses between
either directly manipulation the stack pointer with "rcall ." and "push"
instructions, or copying it to the frame pointer, modifying that and copying it
back, depending on which is shorter.

However, when the frame size is 7 or more, gcc picks the frame-pointer
approach, even when the direct manipulation approach would be shorter.

Here's the example (lines with dashes added by me to indicate the
relevant code

$ cat foo.c
#include 

void bar(uint8_t *);

void foo() {
uint8_t x[SIZE];
bar(x);
}

$ diff -u <(avr-gcc -DSIZE=6 -c foo.c -o - -S) <(avr-gcc -D SIZE=7 -c foo.c -o
- -S)
--- /dev/fd/63  2014-02-21 13:04:18.531142523 +0100
+++ /dev/fd/62  2014-02-21 13:04:18.535142628 +0100
@@ -10,21 +10,24 @@
 foo:
push r28
push r29
-   rcall .
-   rcall .
-   rcall .
in r28,__SP_L__
in r29,__SP_H__
+   sbiw r28,7
+   in __tmp_reg__,__SREG__
+   cli
+   out __SP_H__,r29
+   out __SREG__,__tmp_reg__
+   out __SP_L__,r28
 /* prologue: function */
-/* frame size = 6 */
-/* stack size = 8 */
-.L__stack_usage = 8
+/* frame size = 7 */
+/* stack size = 9 */
+.L__stack_usage = 9
mov r24,r28
mov r25,r29
adiw r24,1
rcall bar
 /* epilogue start */
-   adiw r28,6
+   adiw r28,7
in __tmp_reg__,__SREG__
cli
out __SP_H__,r29

As you can see, for SIZE=7 it switches to a 6-instruction sequence, when a
4-instruction sequence (3x rcall + 1x push) would also suffice.


Relevant code seems to be avr_prologue_setup_frame and avr_out_addto_sp:
 -
https://github.com/mirrors/gcc/blob/c2e306f5efb32b7eed856a1844487cff09aa86ac/gcc/config/avr/avr.c#L1109-L1278
 -
https://github.com/mirrors/gcc/blob/c2e306f5efb32b7eed856a1844487cff09aa86ac/gcc/config/avr/avr.c#L7002-L7014

That code tries both approaches to see which one is smaller, so
presumably it gets the size of either of them wrong and thus makes the
wrong decision.



Note that for the epilogue, the compiler has the turnover point at the expected
5/6 bytes of frame size:

$ diff -u <(avr-gcc -DSIZE=5 -c foo.c -o - -S) <(avr-gcc -D SIZE=6 -c foo.c -o
- -S)
--- /dev/fd/63  2014-02-21 13:05:55.825616219 +0100
+++ /dev/fd/62  2014-02-21 13:05:55.821616121 +0100
@@ -12,23 +12,24 @@
push r29
rcall .
rcall .
-   push __zero_reg__
+   rcall .
in r28,__SP_L__
in r29,__SP_H__
 /* prologue: function */
-/* frame size = 5 */
-/* stack size = 7 */
-.L__stack_usage = 7
+/* frame size = 6 */
+/* stack size = 8 */
+.L__stack_usage = 8
mov r24,r28
mov r25,r29
adiw r24,1
rcall bar
 /* epilogue start */
-   pop __tmp_reg__
-   pop __tmp_reg__
-   pop __tmp_reg__
-   pop __tmp_reg__
-   pop __tmp_reg__
+   adiw r28,6
+   in __tmp_reg__,__SREG__
+   cli
+   out __SP_H__,r29
+   out __SREG__,__tmp_reg__
+   out __SP_L__,r28
pop r29
pop r28
ret


[Bug target/60300] [avr] Suboptimal stack pointer manipulation for frame setup

2014-02-21 Thread matthijs at stdin dot nl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60300

--- Comment #1 from Matthijs Kooijman  ---
I noticed I didn't use -O in the output I pasted, but I just confirmed that the
results are the same with -Os and -O3.


[Bug tree-optimization/45791] Missed devirtualization

2014-02-24 Thread matthijs at stdin dot nl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45791

Matthijs Kooijman  changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #15 from Matthijs Kooijman  ---
I ran into another variant of this problem, which I reduced to the following
testcase. I found the problem on 4.8.2, but it is already fixed in trunk /
gcc-4.9 (Debian 4.9-20140218-1). Still, it might be useful to have the testcase
here for reference.

class Base { };

class Sub : public Base {
public: 
virtual void bar();
};

Sub foo;
Sub * const pointer = &foo;
Sub* function() { return &foo; };

int main() {
// Gcc 4.8.2 devirtualizes this:
pointer->bar();
// but not this:
function()->bar();
}


[Bug other/60040] AVR: error: unable to find a register to spill in class 'POINTER_REGS'

2015-10-13 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60040

Matthijs Kooijman  changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #8 from Matthijs Kooijman  ---
Seems not - just tried with avr-gcc 5.1 and it is still broken:

$ avr-gcc -fpreprocessed -w -mmcu=atmega128 -O2 -s test.i -o /dev/null
test.i: In function 'rtems_fdisk_recycle_segment':
test.i:107:1: error: unable to find a register to spill in class
'POINTER_REGS'
 }
 ^
test.i:107:1: error: this is the insn:
(insn 30 29 31 2 (set (reg:HI 26 r26)
(reg/v/f:HI 51 [ dpd ])) /home/matthijs/test.i:95 83 {*movhi}
 (nil))
test.i:107: confused by earlier errors, bailing out

$ avr-gcc --version
avr-gcc (GCC) 5.1.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

On the Arduino bugtracker [1], another testcase with the same symptoms was
reported. I'm attaching that here. This testcase works with -O2, but
breaks when -Os is used.

$ avr-gcc -c -Os  -mmcu=atmega328p  test2.c -o /dev/null

test2.c: In function 'getSlope':
test2.c:22:1: error: unable to find a register to spill in class
'POINTER_REGS'
 }
 ^
test2.c:22:1: error: this is the insn:
(insn 40 38 42 3 (set (reg:SF 63 [ D.1613 ])
(mem:SF (post_inc:HI (reg:HI 16 r16 [orig:73 ivtmp.13 ] [73]))
[1 MEM[base: _27, offset: 0B]+0 S4 A8])) /home/matthijs/test.c:15 100 {*movsf}
 (expr_list:REG_INC (reg:HI 16 r16 [orig:73 ivtmp.13 ] [73])
(nil)))
test2.c:22: confused by earlier errors, bailing out

[1]: https://github.com/arduino/Arduino/issues/3972


[Bug other/60040] AVR: error: unable to find a register to spill in class 'POINTER_REGS'

2015-10-13 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60040

--- Comment #9 from Matthijs Kooijman  ---
Created attachment 36499
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36499&action=edit
Second testcase, needs -Os to break


[Bug target/66511] New: [avr] whole-byte shifts not optimized away for uint64_t

2015-06-11 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

Bug ID: 66511
   Summary: [avr] whole-byte shifts not optimized away for
uint64_t
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthijs at stdin dot nl
  Target Milestone: ---

When doing whole-byte shifts, gcc usually optimizes away the shifts and
ends up moving data between registers instead. However, it seems this
doesn't happen when uint64_t is used.

Here's an example (assembler output slightly trimmed of unrelated
comments and annotations etc.):

matthijs@grubby:~$ cat test.cpp
#include 

uint8_t foo64_8(uint64_t a) {
return a >> 8;
}

uint16_t foo64_16(uint64_t a) {
return a >> 8;
}

uint8_t foo32_8(uint32_t a) {
return a >> 8;
}

uint16_t foo32_16(uint32_t a) {
return (a >> 8);
}

matthijs@grubby:~$ avr-gcc -Os test.cpp -S -o -
_Z7foo64_8y:
push r16
ldi r16,lo8(8)
rcall __lshrdi3
mov r24,r18
pop r16
ret

_Z8foo64_16y:
push r16
ldi r16,lo8(8)
rcall __lshrdi3
mov r24,r18
mov r25,r19
pop r16
ret


_Z7foo32_8m:
mov r24,r23
ret

_Z8foo32_16m:
clr r27
mov r26,r25
mov r25,r24
mov r24,r23
ret

.ident  "GCC: (GNU) 4.9.2 20141224 (prerelease)"

The output is identical for 4.8.1 on Debian, and the above 4.9.2 on
Arch. I haven't found a readily available 5.x package yet to test.

As you can see, the versions operating on 64 bit values preserve the
8-bit shift (which is very inefficient on AVR), while the versions
running on 32 bit values simply copy the right registers.

The foo32_16 function still has some useless instructions (r27 and r26
are not part of the return value, not sure why these are set) but that
is probably an unrelated problem.

I've marked this with component "target", since I think these
optimizations are avr-specific (or at least not applicable to bigger
architectures).


[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t

2015-08-02 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

--- Comment #2 from Matthijs Kooijman  ---
So, IIUC, this is quite hard to fix? Either you use lib functions, which
prevents the optimizer from just relabeling or coyping registers to apply
shifting, or you don't and then more complex operations will become very
verbose and messy?

Would it make sense (and be possible) to add a special case to not use lib
functions for shifts by a constant number of bits that is also a multiple of 8?
At first glance, that would make a lot of common cases (where an integer is
decomposed into separate bytes or other parts) a lot faster, while still
keeping the lib functions for more complex operations?


[Bug target/77326] New: [avr] Invalid optimization using varargs and a weak function

2016-08-22 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77326

Bug ID: 77326
   Summary: [avr] Invalid optimization using varargs and a weak
function
   Product: gcc
   Version: 5.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthijs at stdin dot nl
  Target Milestone: ---

Created attachment 39483
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39483&action=edit
Preprocessed source generated by avr-gcc foo.c -Dissue -save-temps

This bug was originally reported to the Arduino bug tracker[1], but seems to be
a avr-specific gcc bug.

A minimal program showing the problem:

#include 
#include 

void test(void) __attribute__((weak));

void va_pseudo(int flag,...){
va_list ap;
va_start (ap, flag);
va_end (ap);
}

int main(void) {
#if defined(issue)
va_pseudo(1, 2, 3, 4);
#else
va_pseudo(1, 2, 3);
#endif

if(test!=NULL) {
test();
}
return 0;
}

When compiled with -O but without -Dissue, this produces the following
assembler:

$ avr-gcc foo.c -O; avr-objdump -d a.out

a.out: file format elf32-avr


Disassembly of section .text:

 :
   0:   cf 93   pushr28
   2:   df 93   pushr29
   4:   cd b7   in  r28, 0x3d   ; 61
   6:   de b7   in  r29, 0x3e   ; 62
   8:   df 91   pop r29
   a:   cf 91   pop r28
   c:   08 95   ret

000e :
   e:   1f 92   pushr1
  10:   83 e0   ldi r24, 0x03   ; 3
  12:   8f 93   pushr24
  14:   1f 92   pushr1
  16:   82 e0   ldi r24, 0x02   ; 2
  18:   8f 93   pushr24
  1a:   1f 92   pushr1
  1c:   81 e0   ldi r24, 0x01   ; 1
  1e:   8f 93   pushr24
  20:   ef df   rcall   .-34; 0x0 
  22:   0f 90   pop r0
  24:   0f 90   pop r0
  26:   0f 90   pop r0
  28:   0f 90   pop r0
  2a:   0f 90   pop r0
  2c:   0f 90   pop r0
  2e:   80 e0   ldi r24, 0x00   ; 0
  30:   90 e0   ldi r25, 0x00   ; 0
  32:   89 2b   or  r24, r25
  34:   09 f0   breq.+2 ; 0x38 
  36:   e4 df   rcall   .-56; 0x0 
  38:   80 e0   ldi r24, 0x00   ; 0
  3a:   90 e0   ldi r25, 0x00   ; 0
  3c:   08 95   ret

Note the lines from 0x2e to 0x34, which implement the `if(test!=NULL)`, which
should of course always fail and skip the next `rcall`. Now, when compiling
this with -Dissue, the `or r24, r25` line gets dropped, making the generated
code invalid:

$ avr-gcc foo.c -O -Dissue; avr-objdump -d a.out | grep -B 2 breq
  38:   80 e0   ldi r24, 0x00   ; 0
  3a:   90 e0   ldi r25, 0x00   ; 0
  3c:   09 f0   breq.+2 ; 0x40 <__SREG__+0x1>

The diff between without and with -Dissue looks like this (jump addresses have
been stripped to minimize the diff):

@@ -15,6 +15,9 @@ :

 :
 1f 92   pushr1
+84 e0   ldi r24, 0x04   ; 4
+8f 93   pushr24
+1f 92   pushr1
 83 e0   ldi r24, 0x03   ; 3
 8f 93   pushr24
 1f 92   pushr1
@@ -24,16 +27,17 @@ :
 81 e0   ldi r24, 0x01   ; 1
 8f 93   pushr24
 xx xx   rcall   ; 
-0f 90   pop r0
-0f 90   pop r0
-0f 90   pop r0
-0f 90   pop r0
-0f 90   pop r0
-0f 90   pop r0
+8d b7   in  r24, 0x3d   ; 61
+9e b7   in  r25, 0x3e   ; 62
+08 96   adiwr24, 0x08   ; 8
+0f b6   in  r0, 0x3f; 63
+f8 94   cli
+9e bf   out 0x3e, r25   ; 62
+0f be   out

[Bug target/77326] [avr] Invalid optimization using varargs and a weak function

2016-08-22 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77326

--- Comment #1 from Matthijs Kooijman  ---
The original reporter just added a comment that this does not occur anymore in
gcc 6.1.0, though I haven't got anything newer than 5.1 available here to
check.

[Bug target/100219] New: Arm/Cortex-M: Suboptimal code returning unaligned struct with non-empty stack frame

2021-04-22 Thread matthijs at stdin dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100219

Bug ID: 100219
   Summary: Arm/Cortex-M: Suboptimal code returning unaligned
struct with non-empty stack frame
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthijs at stdin dot nl
  Target Milestone: ---

Consider the program below, which deals with functions returning a struct of
two members, either using a literal value or by forwarding the return value
from another function. When the struct has no alignment, this results in
suboptimal code that breaks the struct (stored in a single registrer) apart
into its members and reassembles them into the struct into a single register
again, where it could just have done absolutely nothing. Giving the struct some
alignment somehow prevents this problem from occuring.

Consider this program:

$ cat Foo.c
struct Result { char a, b; }
#if defined(ALIGN)
__attribute((aligned(ALIGN)))__
#endif
;

struct Result other(const int*);

struct Result func1() {
  int x;
  return other(&x);
}

struct Result func2() {
  struct Result y = {0x12, 0x34};
  return y;
}

struct Result func3() {
  return other(0);
}

Which produces the following code:

$ arm-linux-gnueabi-gcc-10 --version
arm-linux-gnueabi-gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
$ arm-linux-gnueabi-gcc-10 -fno-stack-protector -mcpu=cortex-m4 -c -O3
~/Foo.c && objdump -d Foo.o

 :
   0:   b500push{lr}
   2:   b083sub sp, #12
   4:   a801add r0, sp, #4
   6:   f7ff fffe   bl  0 
   a:   4603mov r3, r0
   c:   b2dauxtbr2, r3
   e:   2000movsr0, #0
  10:   f362 0007   bfi r0, r2, #0, #8
  14:   f3c3 2307   ubfxr3, r3, #8, #8
  18:   f363 200f   bfi r0, r3, #8, #8
  1c:   b003add sp, #12
  1e:   f85d fb04   ldr.w   pc, [sp], #4
  22:   bf00nop

0024 :
  24:   f243 4312   movwr3, #13330  ; 0x3412
  28:   f003 0212   and.w   r2, r3, #18
  2c:   2000movsr0, #0
  2e:   f362 0007   bfi r0, r2, #0, #8
  32:   0a1blsrsr3, r3, #8
  34:   b082sub sp, #8
  36:   f363 200f   bfi r0, r3, #8, #8
  3a:   b002add sp, #8
  3c:   4770bx  lr
  3e:   bf00nop

0040 :
  40:   b082sub sp, #8
  42:   2000movsr0, #0
  44:   b002add sp, #8
  46:   f7ff bffe   b.w 0 
  4a:   bf00nop


Especially note func2, which correctly builds the struct using a single word
literal, and then continues to break it apart and rebuild it.

Note that I added -fno-stack-protector to make the generated code more consise,
but the problem occurs even without this option.

Somehow, the alignment influences this, since adding some alignment makes the
problem disappear:

$ arm-linux-gnueabi-gcc-10 -fno-stack-protector -mcpu=cortex-m4 -c -O3
~/Foo.c -DALIGN=2 && objdump -d Foo.o

Foo.o: file format elf32-littlearm


Disassembly of section .text:

 :
   0:   b500push{lr}
   2:   b083sub sp, #12
   4:   a801add r0, sp, #4
   6:   f7ff fffe   bl  0 
   a:   b003add sp, #12
   c:   f85d fb04   ldr.w   pc, [sp], #4

0010 :
  10:   f243 4012   movwr0, #13330  ; 0x3412
  14:   4770bx  lr
  16:   bf00nop

0018 :
  18:   2000movsr0, #0
  1a:   f7ff bffe   b.w 0 
  1e:   bf00nop


Other things I've observed:
 - When using ALIGN=2 or ALIGN=4, the problem disappears as shown above.
ALIGN=1 is equivalent to no alignment. Using ALIGN=8 also makes the problem
disappear, but it seams this cause the return value to be passed in memory,
rather than in r0 directly.
 - Using -mcpu=arm8, or arm7tdmi, or some other arm cpus I tried, the problem
disappears. With all cortex variants I tried the problem stays, though
sometimes it seems slightly less severe.
 - I could not reproduce this on x86_64.
 - Using a struct with just 1 char, the problem disappears.
 - Using a struct with 4 chars, the problem stays (and becomes more pronounced
because there's more work to rebuild the struct).
 - Using a struct with 2 shorts, the problem disappears for func2, but stays
for func1.
 - Writing something equivalent in C++, the problem also appears (I originally
saw this problem in C++ and then tr

[Bug tree-optimization/97997] New: Missed optimization: Multiply of extended integer cannot overflow

2020-11-25 Thread matthijs at stdin dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97997

Bug ID: 97997
   Summary: Missed optimization: Multiply of extended integer
cannot overflow
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthijs at stdin dot nl
  Target Milestone: ---

When an integer is extended and then multiplied by another integer of the
original size, the resulting multiplication can never overflow. However, gcc
does not seem to realize this. Consider:

uint16_t calc_u(uint16_t x ) {
return (uint32_t)x * 10 / 10;
}

If gcc would realize that x * 10 cannot overflow, it can optimize away the * 10
/ 10. However, it does not:

$ gcc-10 -Os -Wall -Wextra -pedantic foo.c && objdump -S --disassemble=calc_u
a.out
11a0 :
11a0:   f3 0f 1e fa endbr64 
11a4:   0f b7 c7movzwl %di,%eax
11a7:   b9 0a 00 00 00  mov$0xa,%ecx
11ac:   31 d2   xor%edx,%edx
11ae:   6b c0 0aimul   $0xa,%eax,%eax
11b1:   f7 f1   div%ecx
11b3:   c3  retq   

When doing the multiplication signed, this optimization *does* happen:

uint16_t calc_s(uint16_t x ) {
return (int32_t)x * 10 / 10;
}

$ gcc-10 -Os -Wall -Wextra -pedantic foo.c  && objdump -S --disassemble=calc_s
a.out

1199 :
1199:   f3 0f 1e fa endbr64 
119d:   89 f8   mov%edi,%eax
119f:   c3  retq   

Since signed overflow is undefined, gcc presumably assumes that the
multiplication does not overflow and optimizes this. This shows that the
machinery for this optimization exists and works and suggests that the only
thing missing in the unsigned case is realizing that the overflow cannot
happen.

The above uses 16/32bit numbers, but the same happens on 32/64bit (just not on
8/16 bit, because then things are integer-promoted and multiplication is always
signed). When using -O2 or -O3, the code generated for unsigned is different,
but still not fully optimized.

Maybe I'm missing some corner case of the C language that would make this
optimization incorrect, but I think it should be allowed.

The original code that triggered this report is:

#define ticks2us(t)   (uint32_t)((uint64_t)(t)*100 / TICKS_PER_SEC)

Which could be optimized to a single multiply or even bitshift rather than a
multiply and division for particular values of TICKS_PER_SEC, while staying
generally applicable (but slower) for other values.

I took a guess at the component, please correct that if needed.

$ gcc-10 --version
gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug tree-optimization/97997] Missed optimization: Multiply of extended integer cannot overflow

2020-11-26 Thread matthijs at stdin dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97997

--- Comment #5 from Matthijs Kooijman  ---
Awesome, thanks for the quick response and fix!

[Bug libstdc++/106477] With -fno-exception operator new(nothrow) aborts instead of returning null

2023-01-15 Thread matthijs at stdin dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106477

--- Comment #6 from Matthijs Kooijman  ---
Ah, IIUC your patch does not treat -fno-exceptions specially, but just adds a
shortcut for the nothrow new version to skip calling regular new version if it
has not been replaced. In a normal build, that saves throw/catch overhead, and
in a no-exceptions build that prevents the abort associated with that throw.
Clever!

One corner case seems to be when the regular new version is replaced in a
no-exceptions build, but in that case that replacement has no way to signal
failure anyway, and if needed a user can just also replace the nothrow version.

I can't comment on the details of the patch wrt aliases and preprocessor stuff,
but the approach and the gist of the code looks ok to me.

[Bug libstdc++/106477] New: With -fno-exception operator new(nothrow) aborts instead of returning null

2022-07-29 Thread matthijs at stdin dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106477

Bug ID: 106477
   Summary: With -fno-exception operator new(nothrow) aborts
instead of returning null
   Product: gcc
   Version: 11.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthijs at stdin dot nl
  Target Milestone: ---

The nothrow version of operator new is intended to return null on allocation
failure. However, when libstdc++ is compiled with -fno-exceptions, it aborts
instead.

The cause of this failure is that the nothrow operators work by calling the
regular operators, catching any allocation failure exception and turning that
into a null return. However, with -fno-exceptions, the regular operator aborts
instead of throwing, so the nothrow operator never gets a chance to return
null.

Originally, this *did* work as expected, because the nothrow operators would
just call malloc directly. However, as reported in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68210 this violates the C++11
requirement that the nothrow versions must call the regular versions (so
applications can replace the regular version and get the nothrow for free), so
this was changed in
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=b66e5a95c0065fda3569a1bfd3766963a848a00d

Note this comment by Jonathan Wakely in the linked report, which actually
already warns against introducing the behavior I am describing (but the comment
was apparently not considered when applying the fix):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68210#c2

In any case, we have two conflicting requirements:
 1. nothrow operators should return null on failure
 2. nothrow operators should call regular operators

I can see no way to satisfy both. Since -fno-exceptions is already violating
the spec, it would make sense to me to, when -fno-exceptions is specified, only
satisfy 1 and allow 2 to be violated (which is more of a fringe case anyway,
and applications can always replace the nothrow versions too to get the
behavior they need).

Essentially this would mean that with -fno-exceptions, the nothrow versions
would have to call malloc again directly like before (either duplicating code
like before, or maybe introducing a null-returning helper function?).



To reproduce, I made a small testcase. I was originally seeing this in the
Arduino environment on an Atmel samd chip, but I made a self-contained testcase
and tested that using gcc from https://developer.arm.com (using the linker
script from Atmel/Arduino), which is compiled with -fno-exceptions.

The main testcase is simple: An _sbrk() implementation that always fails to
force allocation failure (overriding the default libnosys implementation that
always succeeds), and a single call to operator new that should return null,
but aborts:

$ cat test.cpp 
#include 

volatile void* foo;

extern "C"
void *_sbrk(int n) {
  // Just always fail allocation
  return (void*)-1;
}

int main() {
  // This should return nullptr, but actually aborts (with -fno-exceptions)
  foo = new (std::nothrow) char[65000];
  return 0;
}

In addition, I added a minimal startup.c for memory initialization and reset
vector and a linker script taken verbatim from
https://github.com/arduino/ArduinoCore-samd/raw/master/variants/arduino_zero/linker_scripts/gcc/flash_without_bootloader.ld
(I will attach both files next).

Compiled using:

$ ~/Downloads/gcc-arm-11.2-2022.02-x86_64-arm-none-eabi/bin/arm-none-eabi-gcc
-mcpu=cortex-m0plus -mthumb -g -fno-exceptions --specs=nosys.specs
--specs=nano.specs -Tflash_without_bootloader.ld -nostartfiles test.cpp
startup.c -lstdc++

Running this on the Arduino zero (using openocd and gdb to upload the code
through the EDBG port) shows it aborts:

Program received signal SIGINT, Interrupt.
_exit (rc=rc@entry=1) at
/data/jenkins/workspace/GNU-toolchain/arm-11/src/newlib-cygwin/libgloss/libnosys/_exit.c:16
16 
/data/jenkins/workspace/GNU-toolchain/arm-11/src/newlib-cygwin/libgloss/libnosys/_exit.c:
No such file or directory.
(gdb) bt
#0  _exit (rc=rc@entry=1) at
/data/jenkins/workspace/GNU-toolchain/arm-11/src/newlib-cygwin/libgloss/libnosys/_exit.c:16
#1  0x013a in abort () at
/data/jenkins/workspace/GNU-toolchain/arm-11/src/newlib-cygwin/newlib/libc/stdlib/abort.c:59
#2  0x0128 in operator new (sz=65000) at
/data/jenkins/workspace/GNU-toolchain/arm-11/src/gcc/libstdc++-v3/libsupc++/new_op.cc:54
#3  0x0106 in operator new[] (sz=) at
/data/jenkins/workspace/GNU-toolchain/arm-11/src/gcc/libstdc++-v3/libsupc++/new_opv.cc:32
#4  0x00fe in operator new[] (sz=) at
/data/jenkins/workspace/GNU-toolchain/arm-11/src/gcc/libstdc++-v3/libsupc++/new_opvnt.cc:38
#5  0x0034 in main () at test.cpp:17

[Bug libstdc++/106477] With -fno-exception operator new(nothrow) aborts instead of returning null

2022-07-29 Thread matthijs at stdin dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106477

--- Comment #1 from Matthijs Kooijman  ---
Created attachment 53382
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53382&action=edit
Testcase - main code

[Bug libstdc++/106477] With -fno-exception operator new(nothrow) aborts instead of returning null

2022-07-29 Thread matthijs at stdin dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106477

--- Comment #2 from Matthijs Kooijman  ---
Created attachment 53383
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53383&action=edit
Testcase - startup code

[Bug libstdc++/106477] With -fno-exception operator new(nothrow) aborts instead of returning null

2022-07-29 Thread matthijs at stdin dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106477

--- Comment #3 from Matthijs Kooijman  ---
Created attachment 53384
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53384&action=edit
Testcase - linker script for ATSAMD21G18 (Arduino zero)

[Bug libstdc++/68210] nothrow operator fails to call default new

2022-07-29 Thread matthijs at stdin dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68210

Matthijs Kooijman  changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #8 from Matthijs Kooijman  ---
Note that in comment:2, Jonathan Wakely pointed out a caveat:

> Also we certainly don't want to conform to the new requirement when
> libstdc++ is built with -fno-exceptions, because allocation failure
> would abort in operator new(size_t) and so the nothrow version never
> gets a chance to handle the exception and return null.

But this was not taken into account when implementing the fix for this issue,
meaning nothrow operators are now effectively useless with -fno-exceptions (and
there is thus no way to handle allocation failure other than aborting in that
case).

I created a new bug report about this here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106477

[Bug target/103698] [12 regression] Code assigned to __attribute__((section(".data"))) generates invalid dwarf: leb128 operand is an undefined symbol

2023-04-06 Thread matthijs at stdin dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103698

Matthijs Kooijman  changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #4 from Matthijs Kooijman  ---
I also ran into this problem, with an STM32 codebase that uses libopencm3 (for
peripheral code and the linker script) and uses section(".data") to put a bit
of code in RAM (to prevent flash reads while programming flash).

To fix this problem in my code, I switched from section(".data") to
section(".ramtext"), which is second section that is also put into RAM and
seems intended especially for this purpose. This works with the libopencm3
linker script, which defines this section, YMMV with other linker scripts. E.g.
from
https://github.com/libopencm3/libopencm3/blob/189017b25cebfc609a6c1a5a02047691ef845b1b/ld/linker.ld.S#L136:

.data : {
_data = .;
*(.data*)   /* Read-write initialized data */
*(.ramtext*)/* "text" functions to run in ram */
. = ALIGN(4);
_edata = .;
} >ram AT >rom


>From looking at the linker script, it seems that .data and .ramtext are treated
pretty much in the same way, so I suspect that there is something else (maybe
some builtin rules in gcc/ld/as) that make the data section special in a way
that it causes this problem to be triggered.

Hopefully this is helpful for anyone else running into this same problem.

[Bug middle-end/26724] __builtin_constant_p fails to recognise function with constant return

2023-04-11 Thread matthijs at stdin dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26724

Matthijs Kooijman  changed:

   What|Removed |Added

 CC||matthijs at stdin dot nl

--- Comment #5 from Matthijs Kooijman  ---
I also ran into this problem in an embedded project and the workaround also
works for me - thanks!

I had already made a short testcase in godbolt for this before I found this
report. I'll share it here just in case it is useful for testing this problem
later: https://godbolt.org/z/s1eK6a3Pf

Here's the code:

#include 
// I added always_inline to see if that would help - seems to make not
difference
//[[gnu::always_inline]] static inline bool always_true() 
__attribute__((always_inline));
static inline bool always_true() { return true; }

static constexpr inline bool constexpr_always_true() { return true; }

int main() {
printf("DIRECT: %d\n", __builtin_constant_p(always_true()));
bool var = always_true();
printf("VIAVAR: %d\n", __builtin_constant_p(var));
printf("CONSTEXPR: %d\n", __builtin_constant_p(constexpr_always_true()));
}

Gcc 12.2 outputs:

DIRECT: 0
VIAVAR: 1
CONSTEXPR: 1

Two additional observations:
 - clang seems to behave the same as gcc here
 - Adding constexpr to the function definition also fixes the problem without
the workaround (but might not always be useful - constexpr has more strict
requirements than a __builtin_constant_p test).
 - Adding always_inline attributes makes no difference.