[Bug web/96996] New: Missed optimzation for constant members of non-constant objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96996 Bug ID: 96996 Summary: Missed optimzation for constant members of non-constant objects Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: web Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl Target Milestone: --- When a global class instance is const and initialized using constant arguments to a constexpr constructor, any member references are optimized away (using the constant value rather than a lookup). However, when the object is *not* const, but does have const members, such optimization does not happen. $ cat test2.cpp; gcc -O3 -S -Wall -Wextra -fdump-tree-optimized=/dev/stdout test2.cpp constexpr int v = 1; struct Test { constexpr Test(int v, const int *p) : v(v), p(p) { } int const v; const int * const p; }; const Test constant_test(v, &v); Test non_constant_test(v, &v); int constant_ref() { return constant_test.v + *constant_test.p; } int non_constant_ref() { return non_constant_test.v + *non_constant_test.p; } ;; Function constant_ref (_Z12constant_refv, funcdef_no=3, decl_uid=2360, cgraph_uid=4, symbol_order=6) constant_ref () { [local count: 1073741824]: return 2; } ;; Function non_constant_ref (_Z16non_constant_refv, funcdef_no=4, decl_uid=2362, cgraph_uid=5, symbol_order=7) non_constant_ref () { int _1; const int * _2; int _3; int _5; [local count: 1073741824]: _1 = non_constant_test.v; _2 = non_constant_test.p; _3 = *_2; _5 = _1 + _3; return _5; } In the constant_f() case, the values are completely optimized and the return value is determined at compiletime. In the non_constant_f() case, the values are retrieved at runtime. However, AFAICS there should be no way that these values can be modified at runtime, even when the object itself is not const, since the members are const. So AFAICS, it shoul be possible to evaluation non_constant_f() at compiletime as well. Looking at the C++ spec (I'm quoting from the C++14 draft here), this would seem to be possible as well. [basic.type.qualifier] says "A const object is an object of type const T or a non-mutable subobject of such an object." If I read [intro.object] correctly, subobjects (such as non-static member variables) are also objects, so a non-static member variable declared const would be a "const object". [dcl.type.cv] says "Except that any class member declared mutable (7.1.1) can be modified, any attempt to modify a const object during its lifetime (3.8) results in undefined behavior." So, one can assume that the const member variable is not modified, because if it was, that would be undefined behavior. There is still the caveat of "during its lifetime", IOW, what if you would destroy the object and create a new one it is place. However, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80794#c5 for a discussion of this case. In short, replacing non_constant_test with a new object is possible, but only when it "does not contain any non-static data member whose type is const-qualified or a reference type", which does not hold for this object. This sounds like this provision was made for pretty much this case, even. I suspect that reason that it works for the const object now, is because of the rules for constant expressions. [expr.const] defines rules for constant exprssions and seems to only allow using(through lvalue-to-rvalue conversion) objects of non-integral types when they are constexpr. I can imagine that gcc derives that constant_test might be effectively constexpr, making any expressions that use it also effectively constant expressions. This same derivation probably does not happen for subobjects (I guess "constexpr" is not actually a concept that applies to subobjects at all). However, I think this does not mean this optimization would be invalid, just that it would happen on different grounds than the current optimization. This issue is also related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80794, but AFAICS that is more about partial compilation and making assumptions about what an external function can or cannot do, while this issue is primarily about link-time (though maybe they are more similar internally, I don't exactly know). I believe that this optimization would be quite significant to make, since it allows better abstraction and separation of concerns (i.e. it allows writing a class to be generic, using constructor-supplied parameters, but if you pass constants for these parameters and have just a single instance of such a class, or when methods are inlined or constprop'd, there could be zero runtime overhead for this extra abstraction). Currently, I believe that you either have to accept runtime overhead, or resort to usin
[Bug c++/96996] Missed optimzation for constant members of non-constant objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96996 --- Comment #2 from Matthijs Kooijman --- > Replacing non_constant_test with a new object is possible, and allowed. But > the name "non_constant_test" cannot be used to refer to the new object, so > any calls to non_constant_ref() after the object was replaced would have > undefined behaviour. Which means the compiler can assume there are no such > calls. Thanks for clarifying. But then I could reason that *if* "non_constant_test" is replaced, then accessing it through the old name is undefined behavior, so that would make any value for the constant member variables (such as the original values before replacement) acceptable, right? Hence that does not conflict with applying this optimzation, I'd think.
[Bug c++/96996] Missed optimzation for constant members of non-constant objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96996 --- Comment #5 from Matthijs Kooijman --- > But isn't there const_cast<> to change the value of p? Yes, that makes it possible to write to a const object, but actually doing so is undefined behavior (see [dcl.type.cv] I quoted above). The spec even makes this explicit about const_cast, [expr.const.cast] says: > [ Note: Depending on the type of the object, a write operation through > the pointer, lvalue or pointer to data member resulting from a > const_cast that casts away a const-qualifier may produce undefined > behavior (7.1.6.1). — end note ]
[Bug tree-optimization/80794] constant objects can be assumed to be immutable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80794 --- Comment #10 from Matthijs Kooijman --- Also note that pr96996, that was marked as a duplicate of this report, talks about a notable subcase of the case presented in this report. While this report talks about constant complete objects (e.g. a variable marked const), pr96996 talks about const subobjects (e.g. a const member of a variable that might not be const itself). That pr has some motivation to argue that such const subobjects can be optimized in the same way as const complete objects. Maybe this is obvious for seasoned gcc devs, but I wanted to point it out regardless, since the cases seem distinct enough to me that one might end up fixing this for objects and forgetting about subobjects :-)
[Bug preprocessor/80753] __has_include and __has_include_next taints subsequent I/O errors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80753 Matthijs Kooijman changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #3 from Matthijs Kooijman --- I also ran into this, this is still a problem in gcc 10.0.1. This is a particular problem in the Arduino environment, which relies on preprocessor error messages to automatically pick libraries to include. This bug prevents it from detecting that a particular library is needed and from adding it to the include path, breaking the build (that could have worked without this bug).
[Bug preprocessor/80753] __has_include and __has_include_next taints subsequent I/O errors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80753 --- Comment #4 from Matthijs Kooijman --- I looked a bit at the source, and it seems the the problem is (not surprisingly) that `__has_include` causes the header filename to be put into the cache, and an error message is only generated when the entire include path has been processed without resolving the entry from the cache (essentially this means that an error is only triggered when the entry is put into the cache, *except* when it happens as part of `__has_include`). Relevant function is _cpp_find_file: https://github.com/gcc-mirror/gcc/blob/da13b7737662da11f8fefb28eaf4ed7c50c51767/libcpp/files.c#L506 This is called with kind = _cpp_FFK_HAS_INCLUDE for `__has_include` which prevents an error here: https://github.com/gcc-mirror/gcc/blob/da13b7737662da11f8fefb28eaf4ed7c50c51767/libcpp/files.c#L591-L592 And the cache entry is immediately returned on subsequent calls here: https://github.com/gcc-mirror/gcc/blob/da13b7737662da11f8fefb28eaf4ed7c50c51767/libcpp/files.c#L523-L526 It seems there are continuous lookups in the cache and it took me a while to realize how the cache actually works (I initially thought that maybe files that were *not* found) were not actually put in the cache, but AFAIU the `pfile->file_hash` cache works like this: - Hash slots are indexed by filename - Each slot contains a linked list of entries. - Each entry contains the directory lookups start from. This can be the start_dir, but also the first quote_include or bracket_include dir. Iow, the cache entry is only valid for a lookup that starts at that directory, or has progressed to that directory, to allow a "" include to prime the cache for a <> include as well. - Once a file is found, or the search path is exhausted, the result is stored in the cache for start_dir and for "" and <> includes if the start dir for those has been passed. Anyway, this means that a failed lookup based from __has_include() will cause the failed result to be put into the cache for one or more directories without emitting an error and always be returned for subsequent includes without emitting an error. An obvious fix would be to simply not put the result in the cache for _cpp_FFK_HAS_INCLUDE, but that would be a missed cache opportunity. An alternative would be to add a boolean "error_emitted" to each _cpp_file* in the cache (cannot add it to the cache entry itself, since the same _cpp_file* might end up in different cache entries), that defaults to false and is set to true by open_file_failed. When kind is not _cpp_FFK_HAS_INCLUDE, and returning a cache entry that has "error_emitted" set to false and has an errno call open_file_failed to emit an error. This would require that open_file_failed to emit the right output in this case (i.e. the cached _cpp_file* should not rely on the context in which it was generated), but AFAICS this would be the case. I'm not quite familiar with building and patching gcc (and also really need to get this yak hair back to my actual work), so I probably won't be providing a patch here. Maybe my above analysis enables someone else to do so? :-)
[Bug c/39589] make -Wmissing-field-initializers=2 work with "designated initializers" ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39589 Matthijs Kooijman changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #11 from Matthijs Kooijman --- It seems this was actually implemented at some point (at least for C++, maybe that was the case all along already), though the manual page was not updated to reflect this. Taking the example from the manual (which is documented to *not* cause this warning): matthijs@grubby:~$ cat foo.cpp struct s { int f, g, h; }; struct s x = { .f = 3, .g = 4 }; matthijs@grubby:~$ gcc foo.cpp -c -Wall -Wextra foo.cpp:2:31: warning: missing initializer for member ‘s::h’ [-Wmissing-field-initializers] struct s x = { .f = 3, .g = 4 }; ^ However, this seems to be the case only for C++, if I rename to foo.c, no warning is emitted. I actually came here looking for a way to *disable* this warning for designated initializers on a specific struct. I was hoping to use a struct with designated initializers as an elegant way to specify configuration with optional fields (e.g. by lettin any unspecified fields be initialized to 0 and fill in a default value for them). However, when any caller that omits a field to get the default value is pestered with a warning, that approach does not really work well. On the other hand, disabling the warning completely with a commandline option or pragma seems heavy-handed, since I do consider this a useful warning in many other cases.
[Bug lto/83967] LTO removes C functions declared as weak in assembler(depending on files order in linking)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83967 Matthijs Kooijman changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #14 from Matthijs Kooijman --- I actually think this is a different problem from the fixed https://sourceware.org/bugzilla/show_bug.cgi?id=22502. Using gcc 8.2.1 and binutils 2.31.51.20181213 from the STM32 Arduino core (https://github.com/stm32duino/Arduino_Core_STM32), I can still reproduce this problem using the example from comment 11 (and also in an actual implementation using stm32duino). I also tested the example from the linked bug, which *is* indeed fixed, leading me to believe this is a different problem (or the fix is not complete yet). The example from this bug is a lot bigger than the one from 22502, so there is probably something in here that triggers this. Maybe that the weak implementation is defined in assembly rather than C?
[Bug target/92693] New: Inconsistency between __UINTPTR_TYPE__ and __UINT32_TYPE__ on ARM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92693 Bug ID: 92693 Summary: Inconsistency between __UINTPTR_TYPE__ and __UINT32_TYPE__ on ARM Product: gcc Version: 7.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl Target Milestone: --- Gcc defines a number of macros for types, which are used by stdint.h to define the corresponding typedefs. In particular, I'm looking at uintptr_t. On ARM, this is 32-bits and equals unsigned int: #define __UINTPTR_TYPE__ unsigned int In my code, I was running into problems trying to pass a uintptr_t to a function that has overloads for uint8_t, uint16_t and uint32_t (ambigious function call). Investigating, it turns out that uint32_t is defined as long unsigned int: #define __UINT32_TYPE__ long unsigned int I would expect that, since both types are 32-bit long, they would actually resolve to the same type. This would also make overload resolution work as expected. Is there any reason for this inconsistency, or could it be fixed? To test this, I installed the gcc-arm-none-eabi, version 15:7-2018-q2-6 from Ubuntu Disco (same version should be in Debian testing): $ arm-none-eabi-gcc --version arm-none-eabi-gcc (15:7-2018-q2-6) 7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907] Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ arm-none-eabi-gcc -dM -E -x c++ /dev/null |egrep '(UINTPTR_TYPE|UINT32_TYPE)' #define __UINT32_TYPE__ long unsigned int #define __UINTPTR_TYPE__ unsigned int I see the same problem using gcc 8.2.1 shipped with the STM32 arduino core (https://github.com/stm32duino/Arduino_Core_STM32). To illustrate the original problem I was seeing, here's a small testcase: $ cat foo.cpp #include void func(uint16_t); void func(uint32_t); int main() { func((uintptr_t)nullptr); static_assert(sizeof(uintptr_t) == sizeof(uint32_t), "Sizes not equal"); } $ arm-none-eabi-gcc -c foo.cpp foo.cpp: In function 'int main()': foo.cpp:7:25: error: call of overloaded 'func(uintptr_t)' is ambiguous func((uintptr_t)nullptr); ^ foo.cpp:3:6: note: candidate: void func(uint16_t) void func(uint16_t); ^~~~ foo.cpp:4:6: note: candidate: void func(uint32_t) void func(uint32_t); ^~~~
[Bug target/92693] Inconsistency between __UINTPTR_TYPE__ and __UINT32_TYPE__ on ARM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92693 --- Comment #3 from Matthijs Kooijman --- > I don't see why you should expect that, there's nothing in the standards > suggesting it should be the case. This is true, current behaviour is standards-compliant AFAICS. However, I expect that because it would be consistent, and would make things behave with least surprise (at least for the usecase I suggested). > Changing it would be an ABI change, so seems like a bad idea. Good point. I did a bit more searching and found this Linux kernel patch. The commit message suggests that it might at some point have been consistent: https://patchwork.kernel.org/patch/2845139/ I assume that "bare metal GCC" would refer to the __xxx_TYPE__ macros, or at least whatever you get when you include . > N.B. you get exactly the same overload failure if you call func(1u). The > problem is your overload set, not the definition of uintptr_t. Fair point, though I think that it is hard to define a proper overload set here. In my case, I'm defining functions to print various sizes of integers. Because the body of the function needs to know how big the type is, I'm using the uintxx_t types to define them. I could of course define the function for (unsigned) char, short, int, long, long long, but then I can't make any assumptions about the exact size of each (I could use sizeof and make a generic implementation, but I wanted to keep things simple and use a different implementation for each size). I guess this might boil down to C/C++ being annoying when it comes to integer types, and not something GCC can really fix (though it *would* have been more convenient if this had been consistent from the start). Feel free to close if that seems appropriate.
[Bug tree-optimization/93359] New: Miscompile (loop check omitted) in function with missing return statement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93359 Bug ID: 93359 Summary: Miscompile (loop check omitted) in function with missing return statement Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl Target Milestone: --- I came across a miscompile, where a missing return statement in a function resulted in a simple for loop never terminating. I originally found this in an STM32 ARM Arduino environment, but managed to reduce this to just a few lines of code and found that it also occurs on x86_64. Here's the testcase: matthijs@grubby:~$ cat foo.cpp #include volatile uint8_t *REG = 0x0; uint8_t foo() { for (int i = 0; i < 16; i++) *REG = i; } int main() { foo(); } I used a volatile "register" write here just to have something in the loop that will not be optimized away, I originally had actual code in there. Compiling this results in a loop that never terminates: matthijs@grubby:~$ gcc-8 -Os -c foo.cpp foo.cpp: In function ‘uint8_t foo()’: foo.cpp:8:1: warning: no return statement in function returning non-void [-Wreturn-type] } ^ matthijs@grubby:~$ objdump -S foo.o foo.o: file format elf64-x86-64 Disassembly of section .text: <_Z3foov>: 0: 31 c0 xor%eax,%eax 2: 48 8b 15 00 00 00 00mov0x0(%rip),%rdx# 9 <_Z3foov+0x9> 9: 88 02 mov%al,(%rdx) b: ff c0 inc%eax d: eb f3 jmp2 <_Z3foov+0x2> Disassembly of section .text.startup: : 0: e8 00 00 00 00 callq 5 I get identical output on gcc 9, but gcc 7 produces code as expected: matthijs@grubby:~$ gcc-7 -Os -c foo.cpp matthijs@grubby:~$ objdump -S foo.o foo.o: file format elf64-x86-64 Disassembly of section .text: <_Z3foov>: 0: 31 c0 xor%eax,%eax 2: 48 8b 15 00 00 00 00mov0x0(%rip),%rdx# 9 <_Z3foov+0x9> 9: 88 02 mov%al,(%rdx) b: ff c0 inc%eax d: 83 f8 10cmp$0x10,%eax 10: 75 f0 jne2 <_Z3foov+0x2> 12: c3 retq Disassembly of section .text.startup: : 0: e8 00 00 00 00 callq 5 5: 31 c0 xor%eax,%eax 7: c3 retq Also, running with -O0 produces working code: matthijs@grubby:~$ gcc-8 -O0 -c foo.cpp foo.cpp: In function ‘uint8_t foo()’: foo.cpp:8:1: warning: no return statement in function returning non-void [-Wreturn-type] } ^ matthijs@grubby:~$ objdump -S foo.o foo.o: file format elf64-x86-64 Disassembly of section .text: <_Z3foov>: 0: 55 push %rbp 1: 48 89 e5mov%rsp,%rbp 4: c7 45 fc 00 00 00 00movl $0x0,-0x4(%rbp) b: 83 7d fc 0f cmpl $0xf,-0x4(%rbp) f: 7f 12 jg 23 <_Z3foov+0x23> 11: 48 8b 05 00 00 00 00mov0x0(%rip),%rax# 18 <_Z3foov+0x18> 18: 8b 55 fcmov-0x4(%rbp),%edx 1b: 88 10 mov%dl,(%rax) 1d: 83 45 fc 01 addl $0x1,-0x4(%rbp) 21: eb e8 jmpb <_Z3foov+0xb> 23: 90 nop 24: 5d pop%rbp 25: c3 retq 0026 : 26: 55 push %rbp 27: 48 89 e5mov%rsp,%rbp 2a: e8 00 00 00 00 callq 2f 2f: b8 00 00 00 00 mov$0x0,%eax 34: 5d pop%rbp 35: c3 retq This seems C++-specific, when I rename foo.cpp to foo.c and compiler, it produces output as expected. Here's the compiler versions I used, these are just plain Ubuntu x86_64 version: matthijs@grubby:~$ gcc-7 -v Using built-in specs. COLLECT_GCC=gcc-7 COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.4.0-8ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --ena
[Bug target/56533] New: Linker problem on avr with lto and main function inside archive
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56533 Bug #: 56533 Summary: Linker problem on avr with lto and main function inside archive Classification: Unclassified Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: matth...@stdin.nl When trying to add lto to my Arduino program, it stopped compiling complaining about missing symbols. I've managed to reduce the problem to below minimal example. Note that removing anything from below example makes the problem disappear. In particular, the problem disappears when: * any of the linker options is removed: -mmcu=atmega328p -Os -flto -fwhole-program * the -flto compiler option is removed * using normal gcc (amd64) instead of avr-gcc * linking main.o instead of main.a * declaring realmain as externally_visible in realmain.c Note that in this example, the actual main() function is inside an archive, which is probably the reason for this bug / problem. $ avr-gcc --version avr-gcc (GCC) 4.7.2 Copyright (C) 2012 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ avr-ld --version GNU ld (GNU Binutils) 2.20.1.20100303 Copyright 2009 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License version 3 or (at your option) a later version. This program has absolutely no warranty. $ cat main.c int realmain(void); int main(void) { return realmain(); } $ cat realmain.c int realmain(void) { } $ cat do #!/bin/sh set -x rm -f main.a /usr/bin/avr-gcc -c main.c -o main.o /usr/bin/avr-ar rcs main.a main.o /usr/bin/avr-gcc -c -flto realmain.c -o realmain.o /usr/bin/avr-gcc -mmcu=atmega328p -Os -flto -fwhole-program realmain.o main.a $ ./do + rm -f main.a + /usr/bin/avr-gcc -c main.c -o main.o + /usr/bin/avr-ar rcs main.a main.o + /usr/bin/avr-gcc -c -flto realmain.c -o realmain.o + /usr/bin/avr-gcc -mmcu=atmega328p -Os -flto -fwhole-program realmain.o main.a main.a(main.o): In function `main': main.c:(.text+0x8): undefined reference to `realmain' collect2: error: ld returned 1 exit status
[Bug target/56533] Linker problem on avr with lto and main function inside archive
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56533 --- Comment #2 from Matthijs Kooijman 2013-03-05 12:55:47 UTC --- + /usr/bin/avr-gcc -v -mmcu=atmega328p -Os -flto -fwhole-program realmain.o main.a Using built-in specs. COLLECT_GCC=/usr/bin/avr-gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/avr/4.7.2/lto-wrapper Target: avr Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr/lib --infodir=/usr/share/info --mandir=/usr/share/man --bindir=/usr/bin --libexecdir=/usr/lib --libdir=/usr/lib --enable-shared --with-system-zlib --enable-long-long --enable-nls --without-included-gettext --disable-libssp --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=avr Thread model: single gcc version 4.7.2 (GCC) COMPILER_PATH=/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/:/usr/lib/gcc/avr/4.7.2/../../../avr/bin/ LIBRARY_PATH=/usr/lib/gcc/avr/4.7.2/avr5/:/usr/lib/gcc/avr/4.7.2/../../../avr/lib/avr5/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/4.7.2/../../../avr/lib/ COLLECT_GCC_OPTIONS='-v' '-mmcu=atmega328p' '-Os' '-flto' '-fwhole-program' /usr/lib/gcc/avr/4.7.2/collect2 -flto -m avr5 -Tdata 0x800100 /usr/lib/gcc/avr/4.7.2/../../../avr/lib/avr5/crtm328p.o -L/usr/lib/gcc/avr/4.7.2/avr5 -L/usr/lib/gcc/avr/4.7.2/../../../avr/lib/avr5 -L/usr/lib/gcc/avr/4.7.2 -L/usr/lib/gcc/avr/4.7.2/../../../avr/lib realmain.o main.a -lgcc -lc -lgcc /usr/bin/avr-gcc @/tmp/ccYrSTvi.args Using built-in specs. COLLECT_GCC=/usr/bin/avr-gcc Target: avr Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr/lib --infodir=/usr/share/info --mandir=/usr/share/man --bindir=/usr/bin --libexecdir=/usr/lib --libdir=/usr/lib --enable-shared --with-system-zlib --enable-long-long --enable-nls --without-included-gettext --disable-libssp --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=avr Thread model: single gcc version 4.7.2 (GCC) COLLECT_GCC_OPTIONS='-c' '-v' '-mmcu=atmega328p' '-Os' '-fwhole-program' '-fltrans-output-list=/tmp/ccZEu3t3.ltrans.out' '-fwpa' /usr/lib/gcc/avr/4.7.2/lto1 -quiet -dumpbase realmain.o -mmcu=atmega328p -auxbase realmain -Os -version -fwhole-program -fltrans-output-list=/tmp/ccZEu3t3.ltrans.out -fwpa @/tmp/ccOsOe32 GNU GIMPLE (GCC) version 4.7.2 (avr) compiled by GNU C version 4.7.2, GMP version 5.0.5, MPFR version 3.1.0-p10, MPC version 0.9 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU GIMPLE (GCC) version 4.7.2 (avr) compiled by GNU C version 4.7.2, GMP version 5.0.5, MPFR version 3.1.0-p10, MPC version 0.9 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 COMPILER_PATH=/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/:/usr/lib/gcc/avr/4.7.2/../../../avr/bin/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/:/usr/lib/gcc/avr/4.7.2/../../../avr/bin/ LIBRARY_PATH=/usr/lib/gcc/avr/4.7.2/avr5/:/usr/lib/gcc/avr/4.7.2/../../../avr/lib/avr5/:/usr/lib/gcc/avr/4.7.2/:/usr/lib/gcc/avr/4.7.2/../../../avr/lib/ COLLECT_GCC_OPTIONS='-c' '-v' '-mmcu=atmega328p' '-Os' '-fwhole-program' '-fltrans-output-list=/tmp/ccZEu3t3.ltrans.out' '-fwpa' /usr/bin/avr-gcc @/tmp/ccoysJBM.args Using built-in specs. COLLECT_GCC=/usr/bin/avr-gcc Target: avr Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr/lib --infodir=/usr/share/info --mandir=/usr/share/man --bindir=/usr/bin --libexecdir=/usr/lib --libdir=/usr/lib --enable-shared --with-system-zlib --enable-long-long --enable-nls --without-included-gettext --disable-libssp --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=avr Thread model: single gcc version 4.7.2 (GCC) COLLECT_GCC_OPTIONS='-c' '-v' '-mmcu=atmega328p' '-Os' '-fwhole-program' '-fltrans-output-list=/tmp/ccZEu3t3.ltrans.out' '-fltrans' '-o' '/tmp/ccZEu3t3.ltrans0.ltrans.o' /usr/lib/gcc/avr/4.7.2/lto1 -quiet -dumpbase ccZEu3t3.ltrans0.o -mmcu=atmega328p -auxbase-strip /tmp/ccZEu3t3.ltrans0.ltrans.o -Os -version -fwhole-program -fltrans-output-list=/tmp/ccZEu3t3.ltrans.out -fltrans @/tmp/ccAudUT3 -o /tmp/ccyDScYi.s GNU GIMPLE (GCC) version 4.7.2 (avr) compiled by GNU C version 4.7.2, GMP version 5.0.5, MPFR version 3.1.0-p10, MPC version 0.9 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU GIMPLE (GCC) version 4.7.2 (avr) compiled by GNU C version 4.7.2, GMP version 5.0.5, MPFR version 3.1.0-p10, MPC version 0.9 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 COLLECT_GCC_OPTIONS='-c' '-v' '-mmcu=atmega328p' '-Os' '-fwhole-program' '-fltrans-output-list=/tmp/ccZEu3t3.ltrans.out' '-fltrans' '-o' '/tmp/ccZEu3t3.ltrans0.ltrans.o' /usr/lib/gcc/avr/4.7.2/../../../avr/bin/as -mmcu=atmega328p -mno-skip-bug -o /tmp/ccZEu3t3.ltrans0.ltrans.o /tmp/ccyDScYi.s COMPILER_PAT
[Bug target/56533] Linker problem on avr with lto and main function inside archive
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56533 --- Comment #3 from Matthijs Kooijman 2013-03-05 13:06:18 UTC --- Seems I made a wrong observation in my original report: When I link main.o instead of main.a, the problem does _not_ go away. In fact, I can remove a few more flags then, while still keeping the problem around: $ ./do + rm -f main.a main.o realmain.o + /usr/bin/avr-gcc -c main.c -o main.o + /usr/bin/avr-gcc -c -flto realmain.c -o realmain.o + /usr/bin/avr-gcc -flto -fwhole-program realmain.o main.o main.o: In function `main': main.c:(.text+0x8): undefined reference to `realmain' collect2: error: ld returned 1 exit status main.c and realmain.c are the same as before. However, adding -flto to the main.c compilation makes the problem disappear again. I suspect that this means that without -flto, main.o is passed straight to the linker and with -flto it is included in link-time optimization, which would mean your previous analysis still holds. $ ./do + rm -f *.a main.o realmain.o + /usr/bin/avr-gcc -c -flto main.c -o main.o + /usr/bin/avr-gcc -c -flto realmain.c -o realmain.o + /usr/bin/avr-gcc -flto -fwhole-program realmain.o main.o
[Bug target/56533] Linker problem on avr with lto and main function inside archive
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56533 Matthijs Kooijman changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | --- Comment #5 from Matthijs Kooijman 2013-03-05 14:38:36 UTC --- Just for future reference, the problem here seems to be that I'm using -fwhole-program, but the GCC LTO cannot actually look at the whole program. In particular, .a archives and .o object files that were compiled without -flto, are passed directly to the linker and not included in LTO. Since -fwhole-program makes the compiler assume that all files that are included in LTO compose the whole program, the compiler removes symbols that look unused, but then turn up missing in the final link. So, I shouldn't have been using -fwhole-program, or I should be aware of the above and set externally_visible attributes as needed if I insist on using -fwhole-program. Ideally, the compiler would ask the linker about which symbols are used in these "non-LTO" objects, which is done by -fuse-linker-plugin (which is implied by -flto). However, on the AVR target, it seems there is no linker plugin (at least not in this particular case), which means that without -fwhole-program, the compiler cannot optimize as much (since it has to assume that all symbols are externally visible).
[Bug target/56533] Linker problem on avr with lto and main function inside archive
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56533 Matthijs Kooijman changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||INVALID --- Comment #6 from Matthijs Kooijman 2013-03-05 14:40:15 UTC --- w00ps, didn't mean to change the resolution.
[Bug rtl-optimization/51447] [4.7 Regression] global register variable definition incorrectly removed as dead code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51447 Matthijs Kooijman changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #19 from Matthijs Kooijman --- In case anyone else comes across here and wonders: This fix made it into 4.8, but was not backported into 4.7.3. Regarding the bug description that says "4.7 regression", I have also observed this bug on avr-gcc 4.3.3, so it's not a regression introduced in 4.7. I also noticed this bug on the AVR platform, using 4.7.2. Just in case it helps (perhaps for others to find this bug when Googling for avr-gcc), here's the testcase and bugreport I was preparing before I found this one. // // Under some circumstances, writes to a global register variable are // optimized away, even though that changes behaviour. The below example // illustrates this. // // When compiled as-is, the writes to the variable "global" are removed. // However, when compiling with -DDO_CALL, which adds a function call to // the main function, the writes are preserved. This leads me to believe // that the optimizer sees that main() isn't calling any functions, so // it must be safe to just remove the writes (even though documentation // [1] says "Stores into this register are never deleted even if they // appear to be dead, but references may be deleted or moved or // simplified.") // // It seems that a second condition (in addition to no functions called) // is that the main function does not return. If we add a return path, // the writes show up again. // // However, removing these writes does not seem sane, since there is // also an interrupt routine, which can access the variable, but the // optimizer is apparently not aware that this is a possibility. // // // Tested using: // avr-gcc -mmcu=attiny13 register.c -S -o - -O // avr-gcc -mmcu=attiny13 register.c -S -o - -O -DDO_CALL // avr-gcc -mmcu=attiny13 register.c -S -o - -O -DDO_RETURN // // [1]: // http://gcc.gnu.org/onlinedocs/gcc/Global-Reg-Vars.html#Global-Reg-Vars #include "avr/io.h" #include "avr/cpufunc.h" // Define a global variable in a register register char global asm("r16"); // Just a dummy function void foo() { // Add some nops so this function doesn't get inlined _NOP(); _NOP(); _NOP(); } // Define an ISR that accesses the global. This doesn't actually seem to // make a different, except that if this wasn't here, removing writes to // the global would be acceptable void ISR(INT0_vect) { PORTB = global; } void main() { global = 1; while(1) { #ifdef DO_CALL foo(); #endif #ifdef DO_RETURN return; #endif } }
[Bug c++/78609] invalid member's visibility detection in constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78609 Matthijs Kooijman changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #2 from Matthijs Kooijman --- I also ran into this problem. It seems that gcc somehow inlines c_str() (or rather, evaluates the constexpr variable it is assigned to) before visibility checks (possibly because the constexpr is evaluated before template initialization?). Below is a smaller example, which is confirmed broken up to gcc 8. I could not reduce this example any further, so it seems the essential pattern that triggers this is: - There is a class instance in a constexpr variable with static storage duration - A pointer to a private member of this object is accessed through a method - This pointer is assigned to a constexpr variable - This pointer is assigned in a template instantiation Here's the code: class foo { char c; public: //constexpr foo(): c(1) { } //constexpr foo(char c): c(c) { } constexpr const char* c_str() const { return &c; } }; constexpr foo basename = foo(); // Fails // These also fail, if you add the appropriate constructor above //static constexpr foo basename = foo(1); // Fails //static constexpr foo basename(1); // Fails //static constexpr foo basename{1}; // Fails //static constexpr foo basename{}; // Fails // Surprisingly this works (but needs a constructor above): //static constexpr foo basename; // Works template void call() { // This is where it breaks constexpr const char *unused = basename.c_str(); } int main() { // Instantiate the call function call(); } // Removing the template argument on T makes it work // Letting T be deduced by adding an argument to call() also fails // Making the "unused" variable non-constexpr makes it work // Making get() return c instead of &c makes it work // Making "basename" a static variable inside call() also fails // // Tested on avr-gcc avr-gcc 4.9.2, gcc Debian 6.3.0-18, gcc Debian // 7.2.0-19, gcc Debian 8-20180110-1 $ avr-gcc ATest.cpp -std=c++11 ATest.cpp: In instantiation of ‘void call() [with T = int]’: ATest.cpp:26:13: required from here ATest.cpp:2:10: error: ‘char foo::c’ is private char c; ^ ATest.cpp:21:49: error: within this context constexpr const char *unused = basename.c_str(); ^
[Bug preprocessor/51259] no escape on control characters on linemarker lines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51259 Matthijs Kooijman changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #2 from Matthijs Kooijman --- I just stumbled upon this same issue. The most direct course seems to be to fix the documentation to match the implmentation. In his comment Shakthi Kannan runs the gcc -E output through hexdump -c and says that the octal value is present there, but that's just hexdump that shows any non-printable characters in the file using on octal value. The raw byte is still present in the gcc output. To confirm, here's another testcase: $ touch $'foo\001bar.cpp' $ gcc -E foo^Abar.cpp # 1 "foobar.cpp" # 1 "" # 1 "" # 1 "/usr/include/stdc-predef.h" 1 3 4 # 1 "" 2 # 1 "foobar.cpp" $ gcc -E foo^Abar.cpp |hd 00 23 20 31 20 22 66 6f 6f 01 62 61 72 2e 63 70 70 |# 1 "foo.bar.cpp| 10 22 0a 23 20 31 20 22 3c 62 75 69 6c 74 2d 69 6e |".# 1 "".# 1 "".# 1 "/us| 40 72 2f 69 6e 63 6c 75 64 65 2f 73 74 64 63 2d 70 |r/include/stdc-p| 50 72 65 64 65 66 2e 68 22 20 31 20 33 20 34 0a 23 |redef.h" 1 3 4.#| 60 20 31 20 22 3c 63 6f 6d 6d 61 6e 64 2d 6c 69 6e | 1 "" 2.# 1 "foo.b| 80 61 72 2e 63 70 70 22 0a |ar.cpp".| 88 (Note that my terminal seems to hide the control character in the direct gcc output, but obviously no octal escape is present, and hexdump confirms that the raw byte is present) Looking at the code, you can see the line marker is generated here: https://github.com/gcc-mirror/gcc/blob/edd716b6b1caa1a5cb320a8cd7f626f30198e098/gcc/c-family/c-ppoutput.c#L413-L415 And the escaping of the filename happens here: https://github.com/gcc-mirror/gcc/blob/a588355ab948cf551bc9d2b89f18e5ae5140f52c/libcpp/macro.c#L491-L511 So only \ and " are escaped, nothing else.
[Bug c++/43745] [avr] g++ puts VTABLES in SRAM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43745 Matthijs Kooijman changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #12 from Matthijs Kooijman --- Apologies if this is an obvious question, but I'm not familiar with gcc/g++ internals. Georg-Johann, you say this requires address space support in c++, but I'm not sure I follow you there. Two things: - You say WG21 will never add AS support to C++, but also say that language support for AS is not needed, only internal support in gcc/g++. So that means what WG21 does is not relevant for vtable handling in particular? - Even if AS would not be used, what prevents g++ from emitting the vtables in the `progmem.data` section and generating vtable-invocation code using LPM instructions? This behaviour could be toggled using a commandline option, or some gcc-specific attribute on a class? I would be happy if you could comment on the feasibility of these two approaches, thanks!
[Bug c++/43745] [avr] g++ puts VTABLES in SRAM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43745 --- Comment #14 from Matthijs Kooijman --- Thanks for the additional explanations!
[Bug other/60145] [AVR] Suboptimal code for byte order shuffling using shift and or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60145 --- Comment #3 from Matthijs Kooijman --- Thanks for digging into this :-D I suppose you meant https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=242907 instead of the commit you linked (which is also nice btw, I noticed that extra sbiw in some places as well). Looking at the generated assembly, the optimizations look like fairly simple (composable) translations, but I assume that the optimization needs to happen before/while the assembly is generated, not afterwards. And then I can see that the patterns would indeed become complex. My goal was indeed to compose values. Using a union is endian-dependent, which is a downside. If I understand your vector-example correctly, vectors are always stored as big endian, so using this approach would be portable? I couldn't find anything about this in the documentation, though.
[Bug target/77326] [avr] Invalid optimization omits comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77326 --- Comment #6 from Matthijs Kooijman --- Thanks!
[Bug target/60300] [avr] Suboptimal stack pointer manipulation for frame setup
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60300 --- Comment #3 from Matthijs Kooijman --- Hmm, I don't think the gcc sources support that. AFAICT, they attempt to just find the shortest approach, without caring for speed. For example, look at avr.c, around line 1265, where it says: / Use shortest method / emit_insn (get_sequence_length (sp_plus_insns) < get_sequence_length (fp_plus_insns) ? sp_plus_insns : fp_plus_insns); https://github.com/mirrors/gcc/blob/c2e306f5efb32b7eed856a1844487cff09aa86ac/gcc/config/avr/avr.c#L1265-L1270
[Bug target/60300] [avr] Suboptimal stack pointer manipulation for frame setup
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60300 --- Comment #5 from Matthijs Kooijman --- Ah, then the comments are a bit misleading, yes. Wouldn't it make sense to put this decision outside of avr_sp_immediate_operand, in the same area where the decision between the two options is made? Might lead to a bit of duplication, though, it seems the function is callled twice. In any case, from a user perspective, it surprises me that this exception is made, even when compiling with -Os. Wouldn't it make sense to ignore the range check with -Os? Or is -Os really only used to determine the list of optimizations to (not) run and not supposed to influence the behaviour of the compiler otherwise?
[Bug other/60145] New: [AVR] Suboptimal code for byte order shuffling using shift and or
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60145 Bug ID: 60145 Summary: [AVR] Suboptimal code for byte order shuffling using shift and or Product: gcc Version: 4.8.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl (Not sure what the component should be, just selected "other" for now) Using shifts and bitwise-or to compose multiple bytes into a bigger integer results in suboptimal code on AVR. For example, a few simple functions that take two or four bytes and compose them into (big endian) integers. Since AVR is an 8-bit platform, this essentially just means moving two bytes from the argument register to the return value registers. However, the outputted assembly is significantly bigger than that and contains obvious optimization opportunities. The example below also contains a version that uses a union to compose the integer, which gets optimized as expected (but only works on little-endian systems, since it relies on the native endianness of uint16_t). matthijs@grubby:~$ cat foo.c #include uint16_t join2(uint8_t a, uint8_t b) { return ((uint16_t)a << 8) | b; } uint16_t join2_efficient(uint8_t a, uint8_t b) { union { uint16_t uint; uint8_t arr[2]; } tmp = {.arr = {b, a}}; return tmp.uint; } uint32_t join4(uint8_t a, uint8_t b, uint8_t c, uint8_t d) { return ((uint32_t)a << 24) | ((uint32_t)b << 16) | ((uint32_t)c << 8) | d; } matthijs@grubby:~$ avr-gcc -c foo.c -O3 && avr-objdump -d foo.o foo.o: file format elf32-avr Disassembly of section .text: : 0: 70 e0 ldi r23, 0x00 ; 0 2: 26 2f mov r18, r22 4: 37 2f mov r19, r23 6: 38 2b or r19, r24 8: 82 2f mov r24, r18 a: 93 2f mov r25, r19 c: 08 95 ret 000e : e: 98 2f mov r25, r24 10: 86 2f mov r24, r22 12: 08 95 ret 0014 : 14: 0f 93 pushr16 16: 1f 93 pushr17 18: 02 2f mov r16, r18 1a: 10 e0 ldi r17, 0x00 ; 0 1c: 20 e0 ldi r18, 0x00 ; 0 1e: 30 e0 ldi r19, 0x00 ; 0 20: 14 2b or r17, r20 22: 26 2b or r18, r22 24: 38 2b or r19, r24 26: 93 2f mov r25, r19 28: 82 2f mov r24, r18 2a: 71 2f mov r23, r17 2c: 60 2f mov r22, r16 2e: 1f 91 pop r17 30: 0f 91 pop r16 32: 08 95 ret
[Bug target/60300] New: [avr] Suboptimal stack pointer manipulation for frame setup
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60300 Bug ID: 60300 Summary: [avr] Suboptimal stack pointer manipulation for frame setup Product: gcc Version: 4.8.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl For setting up the stack frame in the function prologue, gcc chooses between either directly manipulation the stack pointer with "rcall ." and "push" instructions, or copying it to the frame pointer, modifying that and copying it back, depending on which is shorter. However, when the frame size is 7 or more, gcc picks the frame-pointer approach, even when the direct manipulation approach would be shorter. Here's the example (lines with dashes added by me to indicate the relevant code $ cat foo.c #include void bar(uint8_t *); void foo() { uint8_t x[SIZE]; bar(x); } $ diff -u <(avr-gcc -DSIZE=6 -c foo.c -o - -S) <(avr-gcc -D SIZE=7 -c foo.c -o - -S) --- /dev/fd/63 2014-02-21 13:04:18.531142523 +0100 +++ /dev/fd/62 2014-02-21 13:04:18.535142628 +0100 @@ -10,21 +10,24 @@ foo: push r28 push r29 - rcall . - rcall . - rcall . in r28,__SP_L__ in r29,__SP_H__ + sbiw r28,7 + in __tmp_reg__,__SREG__ + cli + out __SP_H__,r29 + out __SREG__,__tmp_reg__ + out __SP_L__,r28 /* prologue: function */ -/* frame size = 6 */ -/* stack size = 8 */ -.L__stack_usage = 8 +/* frame size = 7 */ +/* stack size = 9 */ +.L__stack_usage = 9 mov r24,r28 mov r25,r29 adiw r24,1 rcall bar /* epilogue start */ - adiw r28,6 + adiw r28,7 in __tmp_reg__,__SREG__ cli out __SP_H__,r29 As you can see, for SIZE=7 it switches to a 6-instruction sequence, when a 4-instruction sequence (3x rcall + 1x push) would also suffice. Relevant code seems to be avr_prologue_setup_frame and avr_out_addto_sp: - https://github.com/mirrors/gcc/blob/c2e306f5efb32b7eed856a1844487cff09aa86ac/gcc/config/avr/avr.c#L1109-L1278 - https://github.com/mirrors/gcc/blob/c2e306f5efb32b7eed856a1844487cff09aa86ac/gcc/config/avr/avr.c#L7002-L7014 That code tries both approaches to see which one is smaller, so presumably it gets the size of either of them wrong and thus makes the wrong decision. Note that for the epilogue, the compiler has the turnover point at the expected 5/6 bytes of frame size: $ diff -u <(avr-gcc -DSIZE=5 -c foo.c -o - -S) <(avr-gcc -D SIZE=6 -c foo.c -o - -S) --- /dev/fd/63 2014-02-21 13:05:55.825616219 +0100 +++ /dev/fd/62 2014-02-21 13:05:55.821616121 +0100 @@ -12,23 +12,24 @@ push r29 rcall . rcall . - push __zero_reg__ + rcall . in r28,__SP_L__ in r29,__SP_H__ /* prologue: function */ -/* frame size = 5 */ -/* stack size = 7 */ -.L__stack_usage = 7 +/* frame size = 6 */ +/* stack size = 8 */ +.L__stack_usage = 8 mov r24,r28 mov r25,r29 adiw r24,1 rcall bar /* epilogue start */ - pop __tmp_reg__ - pop __tmp_reg__ - pop __tmp_reg__ - pop __tmp_reg__ - pop __tmp_reg__ + adiw r28,6 + in __tmp_reg__,__SREG__ + cli + out __SP_H__,r29 + out __SREG__,__tmp_reg__ + out __SP_L__,r28 pop r29 pop r28 ret
[Bug target/60300] [avr] Suboptimal stack pointer manipulation for frame setup
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60300 --- Comment #1 from Matthijs Kooijman --- I noticed I didn't use -O in the output I pasted, but I just confirmed that the results are the same with -Os and -O3.
[Bug tree-optimization/45791] Missed devirtualization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45791 Matthijs Kooijman changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #15 from Matthijs Kooijman --- I ran into another variant of this problem, which I reduced to the following testcase. I found the problem on 4.8.2, but it is already fixed in trunk / gcc-4.9 (Debian 4.9-20140218-1). Still, it might be useful to have the testcase here for reference. class Base { }; class Sub : public Base { public: virtual void bar(); }; Sub foo; Sub * const pointer = &foo; Sub* function() { return &foo; }; int main() { // Gcc 4.8.2 devirtualizes this: pointer->bar(); // but not this: function()->bar(); }
[Bug other/60040] AVR: error: unable to find a register to spill in class 'POINTER_REGS'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60040 Matthijs Kooijman changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #8 from Matthijs Kooijman --- Seems not - just tried with avr-gcc 5.1 and it is still broken: $ avr-gcc -fpreprocessed -w -mmcu=atmega128 -O2 -s test.i -o /dev/null test.i: In function 'rtems_fdisk_recycle_segment': test.i:107:1: error: unable to find a register to spill in class 'POINTER_REGS' } ^ test.i:107:1: error: this is the insn: (insn 30 29 31 2 (set (reg:HI 26 r26) (reg/v/f:HI 51 [ dpd ])) /home/matthijs/test.i:95 83 {*movhi} (nil)) test.i:107: confused by earlier errors, bailing out $ avr-gcc --version avr-gcc (GCC) 5.1.0 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. On the Arduino bugtracker [1], another testcase with the same symptoms was reported. I'm attaching that here. This testcase works with -O2, but breaks when -Os is used. $ avr-gcc -c -Os -mmcu=atmega328p test2.c -o /dev/null test2.c: In function 'getSlope': test2.c:22:1: error: unable to find a register to spill in class 'POINTER_REGS' } ^ test2.c:22:1: error: this is the insn: (insn 40 38 42 3 (set (reg:SF 63 [ D.1613 ]) (mem:SF (post_inc:HI (reg:HI 16 r16 [orig:73 ivtmp.13 ] [73])) [1 MEM[base: _27, offset: 0B]+0 S4 A8])) /home/matthijs/test.c:15 100 {*movsf} (expr_list:REG_INC (reg:HI 16 r16 [orig:73 ivtmp.13 ] [73]) (nil))) test2.c:22: confused by earlier errors, bailing out [1]: https://github.com/arduino/Arduino/issues/3972
[Bug other/60040] AVR: error: unable to find a register to spill in class 'POINTER_REGS'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60040 --- Comment #9 from Matthijs Kooijman --- Created attachment 36499 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36499&action=edit Second testcase, needs -Os to break
[Bug target/66511] New: [avr] whole-byte shifts not optimized away for uint64_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511 Bug ID: 66511 Summary: [avr] whole-byte shifts not optimized away for uint64_t Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl Target Milestone: --- When doing whole-byte shifts, gcc usually optimizes away the shifts and ends up moving data between registers instead. However, it seems this doesn't happen when uint64_t is used. Here's an example (assembler output slightly trimmed of unrelated comments and annotations etc.): matthijs@grubby:~$ cat test.cpp #include uint8_t foo64_8(uint64_t a) { return a >> 8; } uint16_t foo64_16(uint64_t a) { return a >> 8; } uint8_t foo32_8(uint32_t a) { return a >> 8; } uint16_t foo32_16(uint32_t a) { return (a >> 8); } matthijs@grubby:~$ avr-gcc -Os test.cpp -S -o - _Z7foo64_8y: push r16 ldi r16,lo8(8) rcall __lshrdi3 mov r24,r18 pop r16 ret _Z8foo64_16y: push r16 ldi r16,lo8(8) rcall __lshrdi3 mov r24,r18 mov r25,r19 pop r16 ret _Z7foo32_8m: mov r24,r23 ret _Z8foo32_16m: clr r27 mov r26,r25 mov r25,r24 mov r24,r23 ret .ident "GCC: (GNU) 4.9.2 20141224 (prerelease)" The output is identical for 4.8.1 on Debian, and the above 4.9.2 on Arch. I haven't found a readily available 5.x package yet to test. As you can see, the versions operating on 64 bit values preserve the 8-bit shift (which is very inefficient on AVR), while the versions running on 32 bit values simply copy the right registers. The foo32_16 function still has some useless instructions (r27 and r26 are not part of the return value, not sure why these are set) but that is probably an unrelated problem. I've marked this with component "target", since I think these optimizations are avr-specific (or at least not applicable to bigger architectures).
[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511 --- Comment #2 from Matthijs Kooijman --- So, IIUC, this is quite hard to fix? Either you use lib functions, which prevents the optimizer from just relabeling or coyping registers to apply shifting, or you don't and then more complex operations will become very verbose and messy? Would it make sense (and be possible) to add a special case to not use lib functions for shifts by a constant number of bits that is also a multiple of 8? At first glance, that would make a lot of common cases (where an integer is decomposed into separate bytes or other parts) a lot faster, while still keeping the lib functions for more complex operations?
[Bug target/77326] New: [avr] Invalid optimization using varargs and a weak function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77326 Bug ID: 77326 Summary: [avr] Invalid optimization using varargs and a weak function Product: gcc Version: 5.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl Target Milestone: --- Created attachment 39483 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39483&action=edit Preprocessed source generated by avr-gcc foo.c -Dissue -save-temps This bug was originally reported to the Arduino bug tracker[1], but seems to be a avr-specific gcc bug. A minimal program showing the problem: #include #include void test(void) __attribute__((weak)); void va_pseudo(int flag,...){ va_list ap; va_start (ap, flag); va_end (ap); } int main(void) { #if defined(issue) va_pseudo(1, 2, 3, 4); #else va_pseudo(1, 2, 3); #endif if(test!=NULL) { test(); } return 0; } When compiled with -O but without -Dissue, this produces the following assembler: $ avr-gcc foo.c -O; avr-objdump -d a.out a.out: file format elf32-avr Disassembly of section .text: : 0: cf 93 pushr28 2: df 93 pushr29 4: cd b7 in r28, 0x3d ; 61 6: de b7 in r29, 0x3e ; 62 8: df 91 pop r29 a: cf 91 pop r28 c: 08 95 ret 000e : e: 1f 92 pushr1 10: 83 e0 ldi r24, 0x03 ; 3 12: 8f 93 pushr24 14: 1f 92 pushr1 16: 82 e0 ldi r24, 0x02 ; 2 18: 8f 93 pushr24 1a: 1f 92 pushr1 1c: 81 e0 ldi r24, 0x01 ; 1 1e: 8f 93 pushr24 20: ef df rcall .-34; 0x0 22: 0f 90 pop r0 24: 0f 90 pop r0 26: 0f 90 pop r0 28: 0f 90 pop r0 2a: 0f 90 pop r0 2c: 0f 90 pop r0 2e: 80 e0 ldi r24, 0x00 ; 0 30: 90 e0 ldi r25, 0x00 ; 0 32: 89 2b or r24, r25 34: 09 f0 breq.+2 ; 0x38 36: e4 df rcall .-56; 0x0 38: 80 e0 ldi r24, 0x00 ; 0 3a: 90 e0 ldi r25, 0x00 ; 0 3c: 08 95 ret Note the lines from 0x2e to 0x34, which implement the `if(test!=NULL)`, which should of course always fail and skip the next `rcall`. Now, when compiling this with -Dissue, the `or r24, r25` line gets dropped, making the generated code invalid: $ avr-gcc foo.c -O -Dissue; avr-objdump -d a.out | grep -B 2 breq 38: 80 e0 ldi r24, 0x00 ; 0 3a: 90 e0 ldi r25, 0x00 ; 0 3c: 09 f0 breq.+2 ; 0x40 <__SREG__+0x1> The diff between without and with -Dissue looks like this (jump addresses have been stripped to minimize the diff): @@ -15,6 +15,9 @@ : : 1f 92 pushr1 +84 e0 ldi r24, 0x04 ; 4 +8f 93 pushr24 +1f 92 pushr1 83 e0 ldi r24, 0x03 ; 3 8f 93 pushr24 1f 92 pushr1 @@ -24,16 +27,17 @@ : 81 e0 ldi r24, 0x01 ; 1 8f 93 pushr24 xx xx rcall ; -0f 90 pop r0 -0f 90 pop r0 -0f 90 pop r0 -0f 90 pop r0 -0f 90 pop r0 -0f 90 pop r0 +8d b7 in r24, 0x3d ; 61 +9e b7 in r25, 0x3e ; 62 +08 96 adiwr24, 0x08 ; 8 +0f b6 in r0, 0x3f; 63 +f8 94 cli +9e bf out 0x3e, r25 ; 62 +0f be out
[Bug target/77326] [avr] Invalid optimization using varargs and a weak function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77326 --- Comment #1 from Matthijs Kooijman --- The original reporter just added a comment that this does not occur anymore in gcc 6.1.0, though I haven't got anything newer than 5.1 available here to check.
[Bug target/100219] New: Arm/Cortex-M: Suboptimal code returning unaligned struct with non-empty stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100219 Bug ID: 100219 Summary: Arm/Cortex-M: Suboptimal code returning unaligned struct with non-empty stack frame Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl Target Milestone: --- Consider the program below, which deals with functions returning a struct of two members, either using a literal value or by forwarding the return value from another function. When the struct has no alignment, this results in suboptimal code that breaks the struct (stored in a single registrer) apart into its members and reassembles them into the struct into a single register again, where it could just have done absolutely nothing. Giving the struct some alignment somehow prevents this problem from occuring. Consider this program: $ cat Foo.c struct Result { char a, b; } #if defined(ALIGN) __attribute((aligned(ALIGN)))__ #endif ; struct Result other(const int*); struct Result func1() { int x; return other(&x); } struct Result func2() { struct Result y = {0x12, 0x34}; return y; } struct Result func3() { return other(0); } Which produces the following code: $ arm-linux-gnueabi-gcc-10 --version arm-linux-gnueabi-gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0 $ arm-linux-gnueabi-gcc-10 -fno-stack-protector -mcpu=cortex-m4 -c -O3 ~/Foo.c && objdump -d Foo.o : 0: b500push{lr} 2: b083sub sp, #12 4: a801add r0, sp, #4 6: f7ff fffe bl 0 a: 4603mov r3, r0 c: b2dauxtbr2, r3 e: 2000movsr0, #0 10: f362 0007 bfi r0, r2, #0, #8 14: f3c3 2307 ubfxr3, r3, #8, #8 18: f363 200f bfi r0, r3, #8, #8 1c: b003add sp, #12 1e: f85d fb04 ldr.w pc, [sp], #4 22: bf00nop 0024 : 24: f243 4312 movwr3, #13330 ; 0x3412 28: f003 0212 and.w r2, r3, #18 2c: 2000movsr0, #0 2e: f362 0007 bfi r0, r2, #0, #8 32: 0a1blsrsr3, r3, #8 34: b082sub sp, #8 36: f363 200f bfi r0, r3, #8, #8 3a: b002add sp, #8 3c: 4770bx lr 3e: bf00nop 0040 : 40: b082sub sp, #8 42: 2000movsr0, #0 44: b002add sp, #8 46: f7ff bffe b.w 0 4a: bf00nop Especially note func2, which correctly builds the struct using a single word literal, and then continues to break it apart and rebuild it. Note that I added -fno-stack-protector to make the generated code more consise, but the problem occurs even without this option. Somehow, the alignment influences this, since adding some alignment makes the problem disappear: $ arm-linux-gnueabi-gcc-10 -fno-stack-protector -mcpu=cortex-m4 -c -O3 ~/Foo.c -DALIGN=2 && objdump -d Foo.o Foo.o: file format elf32-littlearm Disassembly of section .text: : 0: b500push{lr} 2: b083sub sp, #12 4: a801add r0, sp, #4 6: f7ff fffe bl 0 a: b003add sp, #12 c: f85d fb04 ldr.w pc, [sp], #4 0010 : 10: f243 4012 movwr0, #13330 ; 0x3412 14: 4770bx lr 16: bf00nop 0018 : 18: 2000movsr0, #0 1a: f7ff bffe b.w 0 1e: bf00nop Other things I've observed: - When using ALIGN=2 or ALIGN=4, the problem disappears as shown above. ALIGN=1 is equivalent to no alignment. Using ALIGN=8 also makes the problem disappear, but it seams this cause the return value to be passed in memory, rather than in r0 directly. - Using -mcpu=arm8, or arm7tdmi, or some other arm cpus I tried, the problem disappears. With all cortex variants I tried the problem stays, though sometimes it seems slightly less severe. - I could not reproduce this on x86_64. - Using a struct with just 1 char, the problem disappears. - Using a struct with 4 chars, the problem stays (and becomes more pronounced because there's more work to rebuild the struct). - Using a struct with 2 shorts, the problem disappears for func2, but stays for func1. - Writing something equivalent in C++, the problem also appears (I originally saw this problem in C++ and then tr
[Bug tree-optimization/97997] New: Missed optimization: Multiply of extended integer cannot overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97997 Bug ID: 97997 Summary: Missed optimization: Multiply of extended integer cannot overflow Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl Target Milestone: --- When an integer is extended and then multiplied by another integer of the original size, the resulting multiplication can never overflow. However, gcc does not seem to realize this. Consider: uint16_t calc_u(uint16_t x ) { return (uint32_t)x * 10 / 10; } If gcc would realize that x * 10 cannot overflow, it can optimize away the * 10 / 10. However, it does not: $ gcc-10 -Os -Wall -Wextra -pedantic foo.c && objdump -S --disassemble=calc_u a.out 11a0 : 11a0: f3 0f 1e fa endbr64 11a4: 0f b7 c7movzwl %di,%eax 11a7: b9 0a 00 00 00 mov$0xa,%ecx 11ac: 31 d2 xor%edx,%edx 11ae: 6b c0 0aimul $0xa,%eax,%eax 11b1: f7 f1 div%ecx 11b3: c3 retq When doing the multiplication signed, this optimization *does* happen: uint16_t calc_s(uint16_t x ) { return (int32_t)x * 10 / 10; } $ gcc-10 -Os -Wall -Wextra -pedantic foo.c && objdump -S --disassemble=calc_s a.out 1199 : 1199: f3 0f 1e fa endbr64 119d: 89 f8 mov%edi,%eax 119f: c3 retq Since signed overflow is undefined, gcc presumably assumes that the multiplication does not overflow and optimizes this. This shows that the machinery for this optimization exists and works and suggests that the only thing missing in the unsigned case is realizing that the overflow cannot happen. The above uses 16/32bit numbers, but the same happens on 32/64bit (just not on 8/16 bit, because then things are integer-promoted and multiplication is always signed). When using -O2 or -O3, the code generated for unsigned is different, but still not fully optimized. Maybe I'm missing some corner case of the C language that would make this optimization incorrect, but I think it should be allowed. The original code that triggered this report is: #define ticks2us(t) (uint32_t)((uint64_t)(t)*100 / TICKS_PER_SEC) Which could be optimized to a single multiply or even bitshift rather than a multiply and division for particular values of TICKS_PER_SEC, while staying generally applicable (but slower) for other values. I took a guess at the component, please correct that if needed. $ gcc-10 --version gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[Bug tree-optimization/97997] Missed optimization: Multiply of extended integer cannot overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97997 --- Comment #5 from Matthijs Kooijman --- Awesome, thanks for the quick response and fix!
[Bug libstdc++/106477] With -fno-exception operator new(nothrow) aborts instead of returning null
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106477 --- Comment #6 from Matthijs Kooijman --- Ah, IIUC your patch does not treat -fno-exceptions specially, but just adds a shortcut for the nothrow new version to skip calling regular new version if it has not been replaced. In a normal build, that saves throw/catch overhead, and in a no-exceptions build that prevents the abort associated with that throw. Clever! One corner case seems to be when the regular new version is replaced in a no-exceptions build, but in that case that replacement has no way to signal failure anyway, and if needed a user can just also replace the nothrow version. I can't comment on the details of the patch wrt aliases and preprocessor stuff, but the approach and the gist of the code looks ok to me.
[Bug libstdc++/106477] New: With -fno-exception operator new(nothrow) aborts instead of returning null
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106477 Bug ID: 106477 Summary: With -fno-exception operator new(nothrow) aborts instead of returning null Product: gcc Version: 11.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl Target Milestone: --- The nothrow version of operator new is intended to return null on allocation failure. However, when libstdc++ is compiled with -fno-exceptions, it aborts instead. The cause of this failure is that the nothrow operators work by calling the regular operators, catching any allocation failure exception and turning that into a null return. However, with -fno-exceptions, the regular operator aborts instead of throwing, so the nothrow operator never gets a chance to return null. Originally, this *did* work as expected, because the nothrow operators would just call malloc directly. However, as reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68210 this violates the C++11 requirement that the nothrow versions must call the regular versions (so applications can replace the regular version and get the nothrow for free), so this was changed in https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=b66e5a95c0065fda3569a1bfd3766963a848a00d Note this comment by Jonathan Wakely in the linked report, which actually already warns against introducing the behavior I am describing (but the comment was apparently not considered when applying the fix): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68210#c2 In any case, we have two conflicting requirements: 1. nothrow operators should return null on failure 2. nothrow operators should call regular operators I can see no way to satisfy both. Since -fno-exceptions is already violating the spec, it would make sense to me to, when -fno-exceptions is specified, only satisfy 1 and allow 2 to be violated (which is more of a fringe case anyway, and applications can always replace the nothrow versions too to get the behavior they need). Essentially this would mean that with -fno-exceptions, the nothrow versions would have to call malloc again directly like before (either duplicating code like before, or maybe introducing a null-returning helper function?). To reproduce, I made a small testcase. I was originally seeing this in the Arduino environment on an Atmel samd chip, but I made a self-contained testcase and tested that using gcc from https://developer.arm.com (using the linker script from Atmel/Arduino), which is compiled with -fno-exceptions. The main testcase is simple: An _sbrk() implementation that always fails to force allocation failure (overriding the default libnosys implementation that always succeeds), and a single call to operator new that should return null, but aborts: $ cat test.cpp #include volatile void* foo; extern "C" void *_sbrk(int n) { // Just always fail allocation return (void*)-1; } int main() { // This should return nullptr, but actually aborts (with -fno-exceptions) foo = new (std::nothrow) char[65000]; return 0; } In addition, I added a minimal startup.c for memory initialization and reset vector and a linker script taken verbatim from https://github.com/arduino/ArduinoCore-samd/raw/master/variants/arduino_zero/linker_scripts/gcc/flash_without_bootloader.ld (I will attach both files next). Compiled using: $ ~/Downloads/gcc-arm-11.2-2022.02-x86_64-arm-none-eabi/bin/arm-none-eabi-gcc -mcpu=cortex-m0plus -mthumb -g -fno-exceptions --specs=nosys.specs --specs=nano.specs -Tflash_without_bootloader.ld -nostartfiles test.cpp startup.c -lstdc++ Running this on the Arduino zero (using openocd and gdb to upload the code through the EDBG port) shows it aborts: Program received signal SIGINT, Interrupt. _exit (rc=rc@entry=1) at /data/jenkins/workspace/GNU-toolchain/arm-11/src/newlib-cygwin/libgloss/libnosys/_exit.c:16 16 /data/jenkins/workspace/GNU-toolchain/arm-11/src/newlib-cygwin/libgloss/libnosys/_exit.c: No such file or directory. (gdb) bt #0 _exit (rc=rc@entry=1) at /data/jenkins/workspace/GNU-toolchain/arm-11/src/newlib-cygwin/libgloss/libnosys/_exit.c:16 #1 0x013a in abort () at /data/jenkins/workspace/GNU-toolchain/arm-11/src/newlib-cygwin/newlib/libc/stdlib/abort.c:59 #2 0x0128 in operator new (sz=65000) at /data/jenkins/workspace/GNU-toolchain/arm-11/src/gcc/libstdc++-v3/libsupc++/new_op.cc:54 #3 0x0106 in operator new[] (sz=) at /data/jenkins/workspace/GNU-toolchain/arm-11/src/gcc/libstdc++-v3/libsupc++/new_opv.cc:32 #4 0x00fe in operator new[] (sz=) at /data/jenkins/workspace/GNU-toolchain/arm-11/src/gcc/libstdc++-v3/libsupc++/new_opvnt.cc:38 #5 0x0034 in main () at test.cpp:17
[Bug libstdc++/106477] With -fno-exception operator new(nothrow) aborts instead of returning null
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106477 --- Comment #1 from Matthijs Kooijman --- Created attachment 53382 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53382&action=edit Testcase - main code
[Bug libstdc++/106477] With -fno-exception operator new(nothrow) aborts instead of returning null
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106477 --- Comment #2 from Matthijs Kooijman --- Created attachment 53383 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53383&action=edit Testcase - startup code
[Bug libstdc++/106477] With -fno-exception operator new(nothrow) aborts instead of returning null
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106477 --- Comment #3 from Matthijs Kooijman --- Created attachment 53384 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53384&action=edit Testcase - linker script for ATSAMD21G18 (Arduino zero)
[Bug libstdc++/68210] nothrow operator fails to call default new
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68210 Matthijs Kooijman changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #8 from Matthijs Kooijman --- Note that in comment:2, Jonathan Wakely pointed out a caveat: > Also we certainly don't want to conform to the new requirement when > libstdc++ is built with -fno-exceptions, because allocation failure > would abort in operator new(size_t) and so the nothrow version never > gets a chance to handle the exception and return null. But this was not taken into account when implementing the fix for this issue, meaning nothrow operators are now effectively useless with -fno-exceptions (and there is thus no way to handle allocation failure other than aborting in that case). I created a new bug report about this here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106477
[Bug target/103698] [12 regression] Code assigned to __attribute__((section(".data"))) generates invalid dwarf: leb128 operand is an undefined symbol
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103698 Matthijs Kooijman changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #4 from Matthijs Kooijman --- I also ran into this problem, with an STM32 codebase that uses libopencm3 (for peripheral code and the linker script) and uses section(".data") to put a bit of code in RAM (to prevent flash reads while programming flash). To fix this problem in my code, I switched from section(".data") to section(".ramtext"), which is second section that is also put into RAM and seems intended especially for this purpose. This works with the libopencm3 linker script, which defines this section, YMMV with other linker scripts. E.g. from https://github.com/libopencm3/libopencm3/blob/189017b25cebfc609a6c1a5a02047691ef845b1b/ld/linker.ld.S#L136: .data : { _data = .; *(.data*) /* Read-write initialized data */ *(.ramtext*)/* "text" functions to run in ram */ . = ALIGN(4); _edata = .; } >ram AT >rom >From looking at the linker script, it seems that .data and .ramtext are treated pretty much in the same way, so I suspect that there is something else (maybe some builtin rules in gcc/ld/as) that make the data section special in a way that it causes this problem to be triggered. Hopefully this is helpful for anyone else running into this same problem.
[Bug middle-end/26724] __builtin_constant_p fails to recognise function with constant return
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26724 Matthijs Kooijman changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #5 from Matthijs Kooijman --- I also ran into this problem in an embedded project and the workaround also works for me - thanks! I had already made a short testcase in godbolt for this before I found this report. I'll share it here just in case it is useful for testing this problem later: https://godbolt.org/z/s1eK6a3Pf Here's the code: #include // I added always_inline to see if that would help - seems to make not difference //[[gnu::always_inline]] static inline bool always_true() __attribute__((always_inline)); static inline bool always_true() { return true; } static constexpr inline bool constexpr_always_true() { return true; } int main() { printf("DIRECT: %d\n", __builtin_constant_p(always_true())); bool var = always_true(); printf("VIAVAR: %d\n", __builtin_constant_p(var)); printf("CONSTEXPR: %d\n", __builtin_constant_p(constexpr_always_true())); } Gcc 12.2 outputs: DIRECT: 0 VIAVAR: 1 CONSTEXPR: 1 Two additional observations: - clang seems to behave the same as gcc here - Adding constexpr to the function definition also fixes the problem without the workaround (but might not always be useful - constexpr has more strict requirements than a __builtin_constant_p test). - Adding always_inline attributes makes no difference.