[Bug rtl-optimization/10837] noreturn attribute causes no sibling calling optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837 --- Comment #17 from Lukas Grätz --- (In reply to Xi Ruoyao from comment #16) > (In reply to gooncreeper from comment #15) > > May I suggest we just add something like __attribute__((trace)) for the > > special abort case? Noreturn was added for code optimization after all, not > > for backtracing. > > It will break any attempts to debug an abort until the libc headers are > updated to use __attribute__((trace)). "any attempts"? We could simply use the gdb debugger and ignore the backtrace. In comparison, the backtrace is a rather restricted debugging instrument. If there are applications that really depend on GCC's backtrace, this should be the reason to keep the current behaviour. > > Note that in GCC noreturn has been added far before the WG14 _Noreturn paper > (even this ticket predates the WG14 paper), so the rationale in the paper > may not apply. Backtracing functionality is highly platform dependent, so there is no surprise that the C standard cannot guarantee anything about it. > > In practice most _Noreturn functions are abort, exit, ..., i.e. they are > only executed one time so optimizing against a cold path does not help much. > I don't think it's a good idea to encourage people to construct some fancy > code by a recursive _Noreturn function (why not just use a loop?!) ... and why not just if and goto? Because it is considered good programming practice to structure source code into functions (not to long) and loops. If a function gets too big, GCC might not optimize it well. > And if > you must write such fancy code anyway IMO musttail attribute (PR83324) will > be a better solution. I agree.
[Bug rtl-optimization/10837] noreturn attribute causes no sibling calling optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837 --- Comment #18 from Lukas Grätz --- On another thought: I think something like -fignore-backtrace could be a reasonable optimization flag (enabled by default for -O4). By ignoring the backtrace we could do other optimizations on size and speed, like in this ticket and duplicates. There are use cases for that, see some of the duplicate tickets. For example in PR56165, they didn't want to support any debugging at all. And even if you want debugging, you might want to disregard backtraces and use a more sophisticated debugging device. This is independent from attribute musttail, with -fignore-backtrace we would leave GCC more freedom to do optimization.
[Bug rtl-optimization/10837] noreturn attribute causes no sibling calling optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837 --- Comment #20 from Lukas Grätz --- (In reply to Petr Skocik from comment #19) > IMO(In reply to Xi Ruoyao from comment #16) > > > In practice most _Noreturn functions are abort, exit, ..., i.e. they are > > only executed one time so optimizing against a cold path does not help much. > > I don't think it's a good idea to encourage people to construct some fancy > > code by a recursive _Noreturn function (why not just use a loop?!) And if > > you must write such fancy code anyway IMO musttail attribute (PR83324) will > > be a better solution. > > There's also longjmp, which may not be all that super cold and may be > executed multiple times. And while yeah, nobody will notice a single call vs > jmp time save against a process spawn/exit, for a longjmp wrapper, it'll > make it a few % faster (as would utilizing _Noreturn attributes for better > register allocation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097, > which would also save a bit of codesize too). Taillcalls can also save a bit > of codesize if the target is near. Just to emphasize, tail call optimization is not just for speed. It is essential to avoid waste of stack space. Especially, to avoid potential stack overflows, it should _not_ be necessary to replace all recursions with loops, as Xi Ruoyao suggests. Ah, and I also think that recursions in C is not fancy (anymore), since everyone expects the compiler to do sibcall or similar optimizations. Noreturn functions are the exception for that. So it would be consequent indeed to do sibcall optimization for noreturn functions, too! Personally, I would be satisfied with the new attribute musttail to enforces tail calls whenever necessary (given that this will be available for C, not C++ only). But speed-wise, musttail might not have the desired effect. It is meant for preserving stack space. --- Following Petr Skocik, I quick-tested on my computer: = longjmp_wrapper.c = #include __attribute__((noreturn)) void longjmp_wrapper(jmp_buf env, int val) { longjmp(env, val); } = longjmp_main.c #include #include __attribute__((noreturn)) void longjmp_wrapper(jmp_buf env, int val); int main(void) { jmp_buf env; for (int i = 0; i < INT_MAX; i++) { if (setjmp(env) == 0) { longjmp_wrapper(env, 1); } } } = After compiling with $ gcc -O3 -m32 -c -S longjmp_wrapper.c -o longjmp_wrapper.S I copied and manually modified the generated longjmp_wrapper.S as follows: 9,15c9 < subl$20, %esp < .cfi_def_cfa_offset 24 < pushl 28(%esp) < .cfi_def_cfa_offset 28 < pushl 28(%esp) < .cfi_def_cfa_offset 32 < calllongjmp --- > jmp longjmp Then I compiled both versions with longjmp_main.c, again with -m32. Measured with "time", the sibcall and unmodified version took around 23.5 sec and 24.5 sec on my computer. So around 4 % improvement for 32 bit x86. For 64 bit x86, both took around 18 secs without noticeable speed difference (perhaps because both arguments are passed in registers instead of stack by 64 bit calling conventions).
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #29 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #28) > (In reply to Lukas Grätz from comment #9) > > Well it is not my testcase. But I added backtracing and observed that the > > printed backtrace is unchanged with your patch. The new > > no_return_to_caller(): > > You haven't tried hard enough. That might be true. > Consider the testcase I've posted to the mailing list, built with -Og -g. > The gcc trunk hits the backtrace not possible problem because rbp is > > clobbered and needed in upper frame CFA computation: Yes, when a backtrace is based on rbp, one needs -fno-omit-frame-pointer. I trusted comment #10 here, as it made sense. > And in the patched gcc (with PR114116 patch to save bp register) backtrace > works but several of the values are bogus: > #2 0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44, > d=d@entry=-559038737, e=e@entry=-559038737, f=f@entry=-559038737, g=48, > h=49) at /tmp/1.c:38 glibc's backtrace() function and friends only reports function names and addresses. This looks like the gdb bt command. I admit, I did not take a proper look into that before. I belief this could and should be somehow be fixed by adding DWARF info that certain callee-saved registers (= the function parameter values) were overwritten. The corrected backtrace could look something like this: #2 0x004011d2 in baz (a=42, b=43, c=44, d=, e=, f=, g=48, h=49) at /tmp/1.c:38 Some parameters would be , and this would be fine because the code was partially compiled with -O2. It is not unusual to have parameter values in gdb's bt. > So, I think we should limit this to -fno-unwind-tables or maybe > -mcmodel=kernel. Now I am confused. The optimization is limited to -fexceptions. And the documentation of -funwind-tables says "Similar to -fexceptions, except". So shouldn't -funwind-tables behave similar to -fexceptions? I don't see anything kernel-specific here.
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #31 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #30) > (In reply to Lukas Grätz from comment #29) > > Yes, when a backtrace is based on rbp, one needs -fno-omit-frame-pointer. I > > trusted comment #10 here, as it made sense. > > See PR114116. > > > glibc's backtrace() function and friends only reports function names and > > addresses. This looks like the gdb bt command. I admit, I did not take a > > proper look into that before. > > Yes, it is gdb bt. And it is what people heavily rely on for debugging, if > something fails an assertion or aborts etc., they want to figure out why. > True. > > I belief this could and should be somehow be fixed by adding DWARF info that > > certain callee-saved registers (= the function parameter values) were > > overwritten. The corrected backtrace could look something like this: > > That can be arranged by emitting those .cfi_undefined directives... > > > #2 0x004011d2 in baz (a=42, b=43, c=44, d=, > > e=, f=, g=48, h=49) at /tmp/1.c:38 > > ... but really will not help users to debug/fix their code. Even when I compile a simple program with gcc -O2 -g: #include int main(int argc, char** argv) { abort(); } I still get an "argc=": (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x77dcd859 in __GI_abort () at abort.c:79 #2 0x00401046 in main (argc=, argv=) at simple.c:4 Yes, for a better debugging, it would be nice if optimised code would just not be optimised... But this goes against optimization. > > > So, I think we should limit this to -fno-unwind-tables or maybe > > > -mcmodel=kernel. > > Now I am confused. The optimization is limited to -fexceptions. And the > > documentation of -funwind-tables says "Similar to -fexceptions, except". So > > shouldn't -funwind-tables behave similar to -fexceptions? I don't see > > anything kernel-specific here. > > Given that even with -fno-asynchronous-unwind-tables (or -fno-unwind-tables) > gcc emits > the unwind info, just not into .eh_frame but .debug_frame, we shouldn't > disable it > just when not emitting .eh_frame, but should just disable it always. > There is a reason why it has been rejected years ago. > If anything, guard it with some non-default -m* option and explain the > consequences to users if they use it. Still, the guarding IMHO should be > done on top of the PR114116 > change, because having random crashes from backtrace or gdb bt even when > user asked for it is a bad idea. Yes, it is a bad idea to have crashes from backtrace or gdb. But when this is only about , I don't see the point about disabling it always.
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #33 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #32) > (In reply to Lukas Grätz from comment #31) > > Even when I compile a simple program with gcc -O2 -g: > > > > #include > > int main(int argc, char** argv) { > > abort(); > > } > > > > > > I still get an "argc=": > > Sure, debugging info in optimized code is best effort. > > > Yes, for a better debugging, it would be nice if optimised code would just > > not be optimised... But this goes against optimization. > > The significant difference between other optimizations and this one is > that normal optimizations affect the debuggability of the optimized function. > This one affects the debuggability of all callers as well, even if they are > compiled in a way that should make them more debuggable. > Normally, if debugging optimized code doesn't work out, one can simply > rebuild that code with -O0 or -Og to make it more debuggable. > Here one would also need to rebuild all the shared libraries it uses. When the debugger is inside the debuggable -O0 or -Og compiled function, we would see all parameters and current variable values. However, in the bt example, we are in another function. So the parameters are only available at best effort. I just noticed that for my simple.c example above, I get "argc=" even with -Og. However, when breakpoint is somewhere else, (gdb) break main (gdb) run (gdb) bt I get the correct "argc=1". The same applies to your example with "break baz". It is just not guaranteed that gdb is able to reconstruct function parameters when we are in some other function.
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #36 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #35) > If I hand edit the gcc trunk + PR114116 patch assembly, add to bar > + .cfi_undefined 3 > + .cfi_undefined 12 > + .cfi_undefined 13 > + .cfi_undefined 14 > + .cfi_undefined 15 > then bt in gdb shows > #2 0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44, > d=, > e=, f= reading variable: value has been optimized out>, g=48, h=49) at /tmp/1.c:38 I can confirm that. What bothers me, is the wording "d=" and not just "d=". (gdb) run Starting program: bar-artificial-mod Program received signal SIGABRT, Aborted. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x77dcd859 in __GI_abort () at abort.c:79 #2 0x004011b1 in bar () at bar-artificial.c:30 #3 0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44, d=, e=, f=, g=48, h=49) at bar-artificial.c:38 #4 0x004012aa in qux () at bar-artificial.c:55 #5 0x004012e4 in main () at bar-artificial.c:62 (gdb) p a No symbol "a" in current context. (gdb) p b No symbol "b" in current context. > and everything in qux live across the call is as well, > (gdb) p $r12 > $10 = > etc. while without that > (gdb) p a > $1 = > (gdb) p b > $2 = > (gdb) p c > $3 = > (gdb) p d > $4 = -559038737 > (gdb) p e > $5 = -559038737 > (gdb) p f > $6 = -559038737 > (gdb) p g > $7 = -559038737 > (gdb) p h > $8 = -559038737 > (gdb) p $r12 > $9 = 3735928559 Where did you set the breakpoint? When I set it somewhere in qux (after a,b,c,... were initialized), I get conclusive results: (gdb) break bar-artificial.c:52 Breakpoint 1 at 0x40124a: file bar-artificial.c, line 52. (gdb) run Breakpoint 1, qux () at bar-artificial.c:52 52corge (__builtin_alloca (foo (52))); (gdb) p a $1 = 42 (gdb) p b $2 = 43 (gdb) p c $3 = 44 (gdb) p d $4 = 45 (gdb) p e $5 = 46 (gdb) p f $6 = 47 (gdb) p g $7 = 48 (gdb) p h $8 = 49 (gdb) p $r12 $9 = 46
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #38 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #37) > Nowhere, just run and when it stops due to abort, just up several times > until reaching the appropriate frame. I see, this gives me: (gdb) frame 4 #4 0x004012aa in qux () at bar-artificial.c:55 55baz (a, b, c, d, e, f, g, h); (gdb) p a $1 = 42 (gdb) p b $2 = 43 (gdb) p c $3 = 44 (gdb) p d $4 = (gdb) p e $5 = (gdb) p f $6 = (gdb) p g $7 = (gdb) p h $8 = (gdb) p $r12 $9 = I checked the dwarf: $ llvm-dwarfdump bar-artificial-mod [...] 0x009f: DW_TAG_subprogram DW_AT_external (true) DW_AT_name ("qux") DW_AT_decl_line (42) DW_AT_prototyped(true) DW_AT_low_pc(0x004011d2) DW_AT_high_pc (0x004012db) DW_AT_frame_base(DW_OP_call_frame_cfa) DW_AT_call_all_calls(true) DW_AT_sibling (cu + 0x02f0) [...] 0x00ee: DW_TAG_variable DW_AT_name("d") DW_AT_decl_line (47) DW_AT_decl_column (0x07) DW_AT_type(cu + 0x0060 "int") [...] $ objdump -W bar-artificial-mod [...] <2>: Abbrev Number: 2 (DW_TAG_variable) DW_AT_name: d DW_AT_decl_file : 1 DW_AT_decl_line : 47 DW_AT_decl_column : 7 DW_AT_type: <0x60> DW_AT_location: 0x6e (location list) DW_AT_GNU_locviews: 0x6a [...] Contents of the .debug_loclists section: [...] 006e v000 v000 views at 006a for: 00401216 0040121f (DW_OP_reg0 (rax)) 0075 v000 v000 views at 006c for: 0040121f 004012d1 (DW_OP_reg3 (rbx)) 007c [...] The problem is that we are not within the loclist range. So in principle, we cannot get the value of the variable, the variable is just not visible. But since gdb is very clever, it searched whether either the value of rax or rbx from within the loclist range remained somewhere. And apparently, for the version without the patch, the value of rbx was saved. For the optimized version with the patch, rbx was not saved, so the value could not been reconstructed. In my opinion, it is just fancy that gdb can do that. Coming back to the "simple.c" example: $ objdump -W simple [...] <2>: Abbrev Number: 3 (DW_TAG_formal_parameter) DW_AT_name: (indirect string, offset: 0x85): argc DW_AT_decl_file : 1 DW_AT_decl_line : 2 DW_AT_decl_column : 14 DW_AT_type: <0x35> DW_AT_location: 0x10 (location list) DW_AT_GNU_locviews: 0xc [...] Contents of the .debug_loclists section: [...] 0010 v000 v000 views at 000c for: 00401126 0040112e (DW_OP_reg5 (rdi)) 0015 v000 v000 views at 000e for: 0040112e 0040112f (DW_OP_entry_value: (DW_OP_reg5 (rdi)); DW_OP_stack_value) 001d [...] And rdi was saved nowhere, regardless of the patch. So gdb could not reconstruct the value of argc.
[Bug debug/114144] New: Variables optimized out by -Og
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114144 Bug ID: 114144 Summary: Variables optimized out by -Og Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: lukas.gra...@tu-darmstadt.de Target Milestone: --- -Og seems to some variable values: On x86-64, function parameters are lost after calling other functions, they were not saved (to the stack). This could be undesired when debugging. Consider the following example: = foo.c int foo (int i) { return i + 1; } = caller.c = extern int foo(int); int main(int argc, char **argv) { int i = foo(argc); int v = foo(i); return v; } Compile on x86-64: $ gcc -Og -g caller.c foo.c -o caller Debug with breakpoint after calling foo(): $ gdb caller Reading symbols from caller... (gdb) break caller.c:5 Breakpoint 1 at 0x401116: file caller.c, line 5. (gdb) run Starting program: /home/lukas/test/caller Breakpoint 1, main (argc=, argv=) at caller.c:5 5 return v; (gdb) print argc $1 = (gdb) print i $2 = (gdb) print v $3 = 3 --- For 32-bit x86 with "-m32 -Og -g" we would get argc and argv because the calling conventions put them on the stack and not a caller-saved variable. However, the variable i would still be at the breakpoint. --- EXPECTED RESULT by compiling the same with -O0: $ gcc -O0 -g caller.c foo.c -o caller $ gdb caller (gdb) break caller.c:5 Breakpoint 1 at 0x40112f: file caller.c, line 5. (gdb) run Starting program: /home/lukas/test/caller Breakpoint 1, main (argc=1, argv=0x7fffdc98) at caller.c:5 5 return 0; (gdb) print argc $1 = 1 (gdb) print i $2 = 2 (gdb) print v $3 = 3 --- This is not a problem of the debugger gdb: By looking at the DWARF debugging info, it turns out that argc has indeed been optimised out: $ objdump -W caller [...] <2><6d>: Abbrev Number: 1 (DW_TAG_formal_parameter) <6e> DW_AT_name: (indirect string, offset: 0x11): argc <72> DW_AT_decl_file : 1 <72> DW_AT_decl_line : 2 <72> DW_AT_decl_column : 14 <73> DW_AT_type: <0x44> <77> DW_AT_location: 0x10 (location list) <7b> DW_AT_GNU_locviews: 0xc [...] Contents of the .debug_loclists section: [...] 0010 v000 v000 views at 000c for: 00401106 0040110e (DW_OP_reg5 (rdi)) 0015 v000 v000 views at 000e for: 0040110e 0040111b (DW_OP_entry_value: (DW_OP_reg5 (rdi)); DW_OP_stack_value) 001d [...] Our breakpoint at location 0x401116 is inside the second range of the loclist. However, we cannot compute "DW_OP_entry_value: (DW_OP_reg5 (rdi))" at 0x401116, since it refers to a state at a previous location (the value of rdi at the subprogram entry). Also, from the disasambly you can clearly see that caller-saved registers %edi and %eax are not saved (neither to the stack nor to callee-saved registers) before calling foo: $ objdump -d caller [...] 00401106 : 401106: 48 83 ec 08 sub$0x8,%rsp 40110a: e8 0c 00 00 00 callq 40111b 40110f: 89 c7 mov%eax,%edi 40: e8 05 00 00 00 callq 40111b 401116: 48 83 c4 08 add$0x8,%rsp 40111a: c3 retq [...] And if you don't like breakpoints, you could modify caller.c as follows to automatically break by calling abort: = caller2.c = #include extern int foo(int); int main(int argc, char **argv) { int i = foo(argc); int v = foo(i); abort(); } = $ gcc -Og -g caller2.c foo.c -o caller2 $ gdb ./caller2 (gdb) run Program received signal SIGABRT, Aborted. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x77dcd859 in __GI_abort () at abort.c:79 #2 0x0040113b in main (argc=, argv=) at caller2.c:6 Here, too, we have argc=. -- According to the documentation of -Og: "-Og should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience." "It is a better choice than -O0 for producing debuggable code because some compiler passes that collect debug information are disabled at -O0." But for many cases, -O0 currently seems to be the better choice. Because -O0 saves all values to the stack, they will not be . If -Og is working as
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #40 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #30) > (In reply to Lukas Grätz from comment #29) > > I belief this could and should be somehow be fixed by adding DWARF info that > > certain callee-saved registers (= the function parameter values) were > > overwritten. The corrected backtrace could look something like this: > > That can be arranged by emitting those .cfi_undefined directives... > > > #2 0x004011d2 in baz (a=42, b=43, c=44, d=, > > e=, f=, g=48, h=49) at /tmp/1.c:38 > > ... but really will not help users to debug/fix their code. > It seems that the reason for is ultimately -Og, not this patch. See Bug 78685. When compiling and debugging your program with -O0 instead, there is not a single .
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #42 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #41) > (In reply to Lukas Grätz from comment #40) > > It seems that the reason for is ultimately -Og, not this > > patch. See Bug 78685. > > No. When PR78685 would be fixed by adding artificial hidden uses of > variables at the end of their scopes, this bug would trigger far more often. > The vars would be live across the calls, so if there would be callee-saved > registers available, the compiler > would use them to hold the variables across the calls. And this bug would > break that. It could be done that way. But I think a better fix for PR78685 would be to save the function parameter values to the stack (and than this problem will not trigger that often). For the following reasons: (1) Timing for push and mov instructions are similar, so the execution speed wouldn't be much affected. (2) A callee needs to somehow restore callee-saved registers, but only if it returns. So the calling conventions cannot guarantee that callee-saved registers are saved somewhere for noreturn functions. But of course, if you disregard this optimization, this would not trigger that often. (3) Potential register pressure when saving additional variables to callee-saved registers: If the execution itself no longer needs the value of a function parameter, there is no need to hold it in a (callee-saved) register accross calls for a quick access. The stack is sufficient for accessing the values with the debugger. (4) The entry values of function parameters should be more helpful, not some later values. E.g., for int foo(int i) { if (i == 42) { h(); } i = 7; bar(); } we would be more interested in the original value of "i" and not the later value "i = 7" as saved by "artificial hidden uses of variables at the end of their scopes". By saving original values to the stack before they are modified, we can keep inspecting the original values. The helpful backtrace from within bar() could be: #1 bar() #2 foo(i@entry=42) The other version would be a bit counter-intuitive, when the argument to foo really was i=42: #1 bar() #2 foo(i=7) Btw., function parameters are not normally part of the backtrace (this is just a nice gdb feature), see Wikipedia: https://en.wikipedia.org/wiki/Stack_trace > Anyway, I've posted > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646649.html > patch which will not revert the #c15/#c24 changes, but guard them with a > non-default option. People who don't care about the harder debugging can > use that option in their code, but widely used shared libraries with > noreturn entrypoints will no longer screw up the debugging for all the > packages that use them. Yes, it took me long, but I agree, it would be better to not worsen debugging experience.
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #43 from Lukas Grätz --- (In reply to Lukas Grätz from comment #42) > (In reply to Jakub Jelinek from comment #41) > > > > No. When PR78685 would be fixed by adding artificial hidden uses of > > variables at the end of their scopes, this bug would trigger far more often. > > The vars would be live across the calls, so if there would be callee-saved > > registers available, the compiler > > would use them to hold the variables across the calls. And this bug would > > break that. > > It could be done that way. But I think a better fix for PR78685 would be to > save the function parameter values to the stack (and than this problem will > not trigger that often). For the following reasons: > Just to be complete with the arguments: (5) Artificial hidden uses of variables at the end of their scopes would not always help when variables are overwritten. For example: int main (int argc, char **argv) { if (argc == 42) { h(); } might_not_return(0); argc = bar(); // here would be the hidden use of argc and argv } The "artificial hidden use" approach would only save the last value of argc, here the result of bar() in line 4 and not the argument argc. The argument value of argc is not used from line 3 on. So that approach would still produce a backtrace with argc=, something like: #1 might_not_return(i=0) #2 main (argc=, argv=0x7fffe0) (6) When the goal is just to have a more helpful gdb bt output, then we don't need to save any variables other than function parameters. In the original example in Bug 78685 and Comment 28 here, this seemed to be the main goal, to get gdb bt more conclusive. If interested in other variable values, too, -O0 might be better then trying hard to patch -Og to save all variable values. (7) Bug 78685 is for x86-64 with -Og. For 32 bit x86 with -Og, we don't run into that problem: there are no function parameters, since they are already on the stack by the 32 bit calling conventions. So saving parameters on the stack for -Og on x86-64 and similar targets without stack-parameters would just be consequent.
[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116 Lukas Grätz changed: What|Removed |Added CC||lukas.graetz@tu-darmstadt.d ||e --- Comment #7 from Lukas Grätz --- (In reply to H.J. Lu from comment #6) > (In reply to Jakub Jelinek from comment #5) > > Yeah. Not to mention, one can call backtrace even if -g0; you just don't > > get nice names for the addresses. Without the patch you get crashes in the > > unwinder when doing backtrace. > > Should we generate REG_CFA_UNDEFINED for unsaved callee-saved registers to > help unwinder: > > https://patchwork.sourceware.org/project/gcc/list/?series=30327 Yes. Also for gdb this is needed. Perhaps I did something wrong. On my computer, I could get the first patch working to save rbp, I also applied the patch which should omit the .cfi_undefined. But somehow, I still not get .cfi_undefined for any of the examples. $ ./gcc/host-x86_64-pc-linux-gnu/gcc/cc1 -O3 gcc/gcc/testsuite/gcc.target/i386/pr38534-7.c -o pr38534-7.S $ cat pr38534-7.S [...] no_return_to_caller: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movl$array+67108860, %eax xorl%r13d, %r13d [...] The ".cfi_undefined 13" is still missing...
[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116 --- Comment #9 from Lukas Grätz --- (In reply to H.J. Lu from comment #8) > (In reply to Lukas Grätz from comment #7) > > (In reply to H.J. Lu from comment #6) > > > (In reply to Jakub Jelinek from comment #5) > > > > Yeah. Not to mention, one can call backtrace even if -g0; you just > > > > don't > > > > get nice names for the addresses. Without the patch you get crashes in > > > > the > > > > unwinder when doing backtrace. > > > > > > Should we generate REG_CFA_UNDEFINED for unsaved callee-saved registers to > > > help unwinder: > > > > > > https://patchwork.sourceware.org/project/gcc/list/?series=30327 > > > > Yes. Also for gdb this is needed. > > > > Perhaps I did something wrong. On my computer, I could get the first patch > > working to save rbp, I also applied the patch which should omit the > > .cfi_undefined. But somehow, I still not get .cfi_undefined for any of the > > examples. > > > > > > $ ./gcc/host-x86_64-pc-linux-gnu/gcc/cc1 -O3 > > gcc/gcc/testsuite/gcc.target/i386/pr38534-7.c -o pr38534-7.S > > > > $ cat pr38534-7.S > > [...] > > no_return_to_caller: > > .LFB0: > > .cfi_startproc > > pushq %rbp > > .cfi_def_cfa_offset 16 > > .cfi_offset 6, -16 > > movl$array+67108860, %eax > > xorl%r13d, %r13d > > [...] > > > > > > The ".cfi_undefined 13" is still missing... > > It is generated only when -g is used. Not on my computer. When I used -g I got: no_return_to_caller: .LFB0: .loc 1 16 1 view -0 .cfi_startproc .loc 1 17 3 view .LVU1 .loc 1 18 3 view .LVU2 .LVL0: .loc 1 18 26 discriminator 1 view .LVU3 .loc 1 16 1 is_stmt 0 view .LVU4 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movl$array+67108860, %eax .loc 1 21 31 view .LVU5 xorl%r13d, %r13d .loc 1 16 1 view .LVU6 Still no .cfi_undefined 13. In principle, it should also be generated without -g, as the rest of .cfi_offset and friends.
[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116 --- Comment #11 from Lukas Grätz --- (In reply to H.J. Lu from comment #10) > (In reply to Lukas Grätz from comment #9) > > > > > Not on my computer. When I used -g I got: > > > > > > no_return_to_caller: > > .LFB0: > > .loc 1 16 1 view -0 > > .cfi_startproc > > .loc 1 17 3 view .LVU1 > > .loc 1 18 3 view .LVU2 > > .LVL0: > > .loc 1 18 26 discriminator 1 view .LVU3 > > .loc 1 16 1 is_stmt 0 view .LVU4 > > pushq %rbp > > .cfi_def_cfa_offset 16 > > .cfi_offset 6, -16 > > movl$array+67108860, %eax > > .loc 1 21 31 view .LVU5 > > xorl%r13d, %r13d > > .loc 1 16 1 view .LVU6 > > > > > > Still no .cfi_undefined 13. In principle, it should also be generated > > without -g, as the rest of .cfi_offset and friends. > > Did you apply my patch? I got > > .globl no_return_to_caller > .type no_return_to_caller, @function > no_return_to_caller: > .LFB0: > .file 1 "pr38534-1.c" > .loc 1 16 1 view -0 > .cfi_startproc > .loc 1 17 3 view .LVU1 > .loc 1 18 3 view .LVU2 > .LVL0: > .loc 1 18 26 discriminator 1 view .LVU3 > .loc 1 16 1 is_stmt 0 view .LVU4 > subq$24, %rsp > .cfi_undefined 15 > .cfi_undefined 14 > .cfi_undefined 13 > .cfi_undefined 12 > .cfi_undefined 6 > ... I applied it, double checked, make distclean, configure, make again. But your result seems different. Have you applied Jakub Jelinek's patch to save %rbp? I applied both patches. Perhaps there was some subtle merge-conflict with the two patches.
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #44 from Lukas Grätz --- (In reply to Tom Tromey from comment #39) > (In reply to Lukas Grätz from comment #36) > > > > #2 0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44, > > > d=, > > > e=, f= > > reading variable: value has been optimized out>, g=48, h=49) at > > > /tmp/1.c:38 > > > > > > I can confirm that. What bothers me, is the wording "d= > out>" and not just "d=". > > Could you file a gdb bug about this? Preferably with some > kind of test case? Done. See: https://sourceware.org/bugzilla/show_bug.cgi?id=31436
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #45 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #28) > (In reply to Lukas Grätz from comment #9) > > Well it is not my testcase. But I added backtracing and observed that the > > printed backtrace is unchanged with your patch. The new > > no_return_to_caller(): > > You haven't tried hard enough. > Consider the testcase I've posted to the mailing list, built with -Og -g. > It is artificial in that register pressure is increased artificially rather > than coming from meaningful code, noipa attribute is used heavily instead of > functions being too large or in different TUs, and optimize attribute used > instead of the noreturn function sitting in a different library, built there > with -O2, while user program say with -Og. I found a movq%rsp, %rbp .cfi_def_cfa_register 6 in the assembler output of your example code in function qux(). After that, the value of %rsp is only reconstructable with %rbp. Because there is some alloca with unkown size at compile time in qux(), we could not reconstruct %rsp otherwise. So I was ultimately wrong, and the value of %rbp would be needed to construct the backtrace in some cases. So the only option to still get the backtrace is to apply your patch to save %rbp (given that .cfi_def_cfa_register always points to %rbp). But I guess you know that already.
[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116 --- Comment #13 from Lukas Grätz --- (In reply to H.J. Lu from comment #12) > (In reply to Lukas Grätz from comment #11) > > > > I applied it, double checked, make distclean, configure, make again. > > > > But your result seems different. Have you applied Jakub Jelinek's patch to > > No. > > > save %rbp? I applied both patches. Perhaps there was some subtle > > merge-conflict with the two patches. > > Please try just my patch. Thanks, that worked!
[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116 --- Comment #14 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #2) > Created attachment 57545 [details] > gcc14-pr114116.patch > > This seems to fix it, so far tested just on the small testcase, back to the > expected backtrace there. As I said in PR 38534, comment [1], the rsp could be saved to rbp due to an unknown-sized stack-frame: movq%rsp, %rbp .cfi_def_cfa_register 6 Therefore, if we want the backtrace in such situations, we would need to save rbp, too, as your patch does. The patch might even not be enough, if there is the possibility that we could .cfi_def_cfa_register with a register other than rbp/6. In that case, the patch can be ignored and it is left to disable the optimization by default, as you already suggested, I think you already have a patch for that. H.J. Lu's patch to emit .cfi_undefined is needed in any case. Only that both patches are currently incompatible. There also seems to be a bug in libgcc/unwind-dw2.c:249, causing a SEGV when register values are unavailable due to .cfi_undefined. This is already known, as the comment there suggests. This happens during a call to glibc's backtrace(), even though the registers are not needed for the backtrace (in that case, gdb's backtrace is fine, glibc's backtrace crashes in libgcc). It should be possible to print best-effort-traces without crashing, in fact, calling backtrace() should never lead to a crash. Bug 103510 might be related and this should be fixed independently. Thanks for the work putting in this and I am sorry for the mess on my side! [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534#c45
[Bug c/111643] New: __attribute__((flatten)) with -O1 runs out of memory (killed cc1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643 Bug ID: 111643 Summary: __attribute__((flatten)) with -O1 runs out of memory (killed cc1) Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: lukas.gra...@tu-darmstadt.de Target Milestone: --- Created attachment 56017 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56017&action=edit C file When I run gcc -c -O0 runs_out_of_memory.i -o runs_out_of_memory.o (see the attached .i file) everything is fine. But when I run gcc -c -O1 runs_out_of_memory.i -o runs_out_of_memory.o then I get: gcc: fatal error: Killed signal terminated program cc1 Apparently, quickly runs out of memory. I have 16 GB ram and the program is rather simple. I tested it with gcc versions 9.4.0, 5.1 and 13.2 (target x86_64-linux-gnu) on ubuntu 20.04. I believe the problem is the __attribute__((flatten)) on several methods. How I created the source file: The code comes from busybox (file coreutils/expr.c) and musl header files. Additionally, I replaced every function 'name' with 'name_original' and added a wrapper with __attribute__((flatten)), for later instrumentation (I did this with a script). I used that attribute, because to reduce the overhead of the wrapper functions and I believe this should be fine. My reason why I introduced the wrappers in the first place was to allow a fine-grained instrumentation of these functions.
[Bug ipa/111643] __attribute__((flatten)) with -O1 runs out of memory (killed cc1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643 --- Comment #3 from Lukas Grätz --- (In reply to Marc Glisse from comment #2) > (In reply to Andrew Pinski from comment #1) > > I am 99% sure this is falls under don't do this as flatten inlines > > everything it can that the function calls ... > > Maybe people end up abusing flatten because we are missing a convenient way > for a caller to ask that a call be inlined? From the callee, we can use > always_inline (couldn't this be used on name_original in this testcase?), > but from the caller... Here even a non-recursive version of flatten would > have helped. Yes, this was what I was searching for, but I found only flatten. Also, that flatten is applied recursively is not mentioned in the documentation and it is also not what I would expect. https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html I don't want to always_inline name_original. What I want is to only inline name_original when called by the wrapper function name, hence the flatten. Because I replace every call to name with name_original where I don't want to apply the instrumentation by the wrapper function name. Thanks!
[Bug ipa/111643] __attribute__((flatten)) with -O1 runs out of memory (killed cc1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643 --- Comment #4 from Lukas Grätz --- Sorry, just to clarify, whether I understood your two comments correctly. Should foo() be inlined in the following example because flatten works recursively? void foo (void) { // CODE } int bar_original (void) { // CODE foo(); // CODE } __attribute__((flatten)) int bar (void) { // INSTRUMENTATION CAN GO HERE return bar_original(); } I thought that according to the documentation of flatten, foo() would not be affected by the flatten attribute of bar(). It says: "For a function marked with this attribute, every call inside this function is inlined, if possible." The call to foo() is not directly inside the function bar(). Only if bar_original() had also the __attribute__((flatten)), I would expect foo() to be made inline in bar() because of recursive flatten. Of course, it could still be inlined because some heuristics...
[Bug ipa/111643] __attribute__((flatten)) with -O1 runs out of memory (killed cc1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643 --- Comment #9 from Lukas Grätz --- Thanks for everything, it seemed to be a misunderstanding from my side anyway and the documentation fix should help others. I am sorry for being silent, I was sick for a few days. As for my original problem, I am thinking of opening a new report, because I realized there could be another solution without flatten. To explain a bit more, we have bar_original() and bar_new(), the latter should behave identical to the former except one additional statement, the "instrumentation". Since the instrumentation can be done in two assembler instructions only, the overhead of bar_new() calling bar_original() is not negligible. int bar_original (int x) { /* CODE */ } unsigned int trace_buffer[512]; uint8_t trace_pos; #define FUNCTION_NUMBER_bar 0x686 int bar_new (int x) { trace_buffer[trace_pos++] = 0x686; // instrumentation return bar_original(x); } My idea: Do not touch the stack inside bar_new() and replace the call in bar_new() with a jump or better a fall-through to bar_original(). This is possible, because both functions have the same signature. It could save around 4 instructions and some stack memory. I have a lot of such functions after my instrumentation step. I also wondered whether int bar_alias (void) { return bar_original(); } could be a portable alternative to attribute alias. Except that current GCC does not translate it that way.
[Bug ipa/111643] __attribute__((flatten)) with -O1 runs out of memory (killed cc1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643 --- Comment #11 from Lukas Grätz --- (In reply to Alexander Monakov from comment #10) > (In reply to Lukas Grätz from comment #9) > > I also wondered whether > > > > int bar_alias (void) { return bar_original(); } > > > > could be a portable alternative to attribute alias. Except that current GCC > > does not translate it that way. > > That's because function addresses are significant and so > > &bar_alias == &bar_original > > must evaluate to false, but would be true for aliases. > > In theory compilers could do better by introducing fall-through aliases: > https://gcc.gnu.org/wiki/ > cauldron2019talks?action=AttachFile&do=view&target=fallthrough-aliases.pdf Thanks a lot! I haven't thought about function addresses. Is there hope that fall-through aliases get into gcc? Then my perhaps my instrumentation fall-through would also be possible to implement.
[Bug ipa/111643] __attribute__((flatten)) with -O1 runs out of memory (killed cc1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643 --- Comment #14 from Lukas Grätz --- (In reply to Andrew Pinski from comment #13) > (In reply to Andrew Pinski from comment #12) > > Gcc does have tail call optimization which should allow the instrumentation > > with less overhead. Though tail call optimization happens at -O2 and above > > only (by default). > > The only improvement to this would be fall through alias which allows the > removal of the jump to the other function. A direct non-conditional jump is > usually predictable so the overhead should be small still. Thanks! I thought that there was still some stack involved also causing some overhead for every function call (in comparison to a pure non-conditional jump). When I have time next week, I will try to look into that in detail.
[Bug c/111786] New: No tail recursion for simple program
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111786 Bug ID: 111786 Summary: No tail recursion for simple program Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: lukas.gra...@tu-darmstadt.de Target Milestone: --- Created attachment 56096 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56096&action=edit C code of expr_main Follow up with nearly the same source file as 111643, only without the flatten attribute. Sorry for taking so long for that. I learned the optimized compiler should output a tail recursion. But this seams not to be the case: With "sub" and "call", 16 bytes on the stack are used. The file attached file contains: --- int expr_main(int argc, char **argv) { return expr_main_original(argc, argv); } --- And after cc1 -O3 on amd64, the output contains: -- gcc 13.2.0 -- expr_main: subq$8, %rsp callexpr_main_original --- -- gcc 9.4.0 shipped with ubuntu 20.04 --- expr_main: endbr64 pushq %rax popq%rax pushq %rax callexpr_main_original --- -- Expected -- expr_main: jmp expr_main_original --- If I compile the above snippet only, I get the expected result. But not when compiling the whole C file which also includes the body of expr_main_original(). I also suspect there are some other factors I don't know, since many other functions I tested yield the expected result. In my case, the overhead seams to be negligible. However, I think it should be possible to construct similar recursive programs where the overhead compared to tail recursion is not negligible.
[Bug c/111786] No tail recursion for simple program
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111786 --- Comment #3 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #1) > We completely intentionally don't emit tail calls to noreturn functions, so > that e.g. in case of abort one doesn't need to virtually reconstruct > backtrace. > In your case, the interprocedural optimizations determine expr_main_original > is noreturn and so calls it normally (and optimizes away anything after that > call). Thank you very much indeed! (Ah yes, this also explains why there is not "ret".) And sorry for not realizing that this is duplicate. So the "call" is intentionally emitted by gcc for a better debugging experience. I agree, this makes perfectly sense in many cases. However, the price of better debugging seems to be the danger of a stack overflow. After I understood your "complete" intention, it took me about 20 min to construct an example with bears a stack overflow following that intention. --- void foo(int n) { if (n == 0) exit(0); int x[200]; for (int i = 0; i < 200; i++) extern_function(x[i], x[200-i]); return foo(n-1); } --- After adding __attribute__((noreturn)), compiling with -O3 and passing 1 to foo(), I get a segmentation fault. There is still a warning "function declared ‘noreturn’ has a ‘return’ statement". But in our case, the noreturn attribute is not wrong, because none of the recursive calls actually do return. This might be something that interprocedure optimizations detect in the future. So even without attribute noreturn, gcc could decide to produce no tail recursion (because it is a noreturn function, regardless of the noreturn attribute). Last remark, then I remain silent. I just learned that clang actually has the attribute musttail which would help for my reported C file as well as in the foo() example above to prevent the stack overflow. But I guess it is not planned to add musttail to gcc?
[Bug c/111896] New: call with wrong stack alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111896 Bug ID: 111896 Summary: call with wrong stack alignment Product: gcc Version: 9.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: lukas.gra...@tu-darmstadt.de Target Milestone: --- Created attachment 56157 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56157&action=edit ccnhCTdD.i.tmp.i For some reason, I manged to get a SEGV when running a program. I spent time debugging it, and found out that the problem was when executing: movaps %xmm0,0x40(%rsp) It took me some time, but I realized the SEGV was caused by the rsp pointer 8 bytes off. It should be aligned to 16 bytes. So wrong alignment. I also found out where the misalignment happend. See the attached file. dlist_free_original() is calling freeit(). This is compiled as dlist_free_original.constprop.0 calling do_line() as follows: dlist_free_original.constprop.0: ... pushq %rbp ... pushq %rbx ... calldo_line So the stack is misaligned when the call happens. It might be because do_line() is written in inline asm with __attribute__((naked)). Starting with gcc 11.3, there seems to be an extra "sub rsp,8" which seems to solve this. But I was using gcc 9.4.0 (shipped with ubuntu 20.04) on amd64 linux. A quick check on godbolt showed me that misalignment still happen in gcc 11.2. So I am unsure if this is still relevant but I am reporting just in case. gcc -O3 -c -S ccnhCTdD.i.tmp.i -o tmp.s If you need the full executable or anything else, ask me. Background: I wanted to have a way to record which functions where called through a pointer. For that, I created a wrapper for every function, renaming the original function to ..._original. I also created a macro renaming direct calls to _original so that only calls through a pointer were left. The wrapper functions are doing their logging (it takes only a few instructions) and then sibcall to the respective original function. A wrapper for vararg functions seemed to be only possible using asm, so I used asm. Since the other functions might be static, I had to do inline asm with attribute naked.
[Bug target/111896] call with wrong stack alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111896 --- Comment #2 from Lukas Grätz --- (In reply to Andrew Pinski from comment #1) > No I think you are looking into the wrong location. > > When a call happens, it pushes a value on the stack aligning the stack that > is incoming into that function. > > In the case of GCC 11.3 and above, there is inlining happening. Well, I could be mistaken. But I couldn't see the inlining. In GCC 11.3 and above I get something like: == dlist_free_original.constprop.0: pushrbp pushrbx ... sub rsp, 8 ... calldo_line == In GCC 11.2 and below it is something like: = dlist_free_original.constprop.0: pushrbp ... pushrbx ... calldo_line === And I checked with the gdb debugger that the rsp is indeed misaligned at the start of do_line(). The alignment was OK at the start of "dlist_free_original.constprop.0". == $ gdb busybox_unstripped GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2 ... (gdb) break dlist_free_original.constprop.0 Breakpoint 1 at 0x59a7ac (gdb) break do_line Breakpoint 2 at 0x59a474 (gdb) run patch -R -i input.patch Breakpoint 1, 0x0059a7ac in dlist_free_original.constprop () (gdb) i r rsp rsp0x7fffd998 0x7fffd998 (gdb) c Continuing. Breakpoint 2, 0x0059a474 in do_line () (gdb) i r rsp rsp0x7fffd980 0x7fffd980 ==
[Bug target/111896] call with wrong stack alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111896 --- Comment #5 from Lukas Grätz --- Thanks a lot!
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 Lukas Grätz changed: What|Removed |Added CC||lukas.graetz@tu-darmstadt.d ||e --- Comment #6 from Lukas Grätz --- (In reply to Sam James from comment #5) > (In reply to thutt from comment #2) > PR10837 has some discussion on this point too. The debugging argument there was for the backtrace. For that we only need to follow the calling conventions to save the stack and instruction pointers. Other registers, including callee saved registers like r12,r13,r14,r15 are not used in a backtrace.
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #7 from Lukas Grätz --- (In reply to H.J. Lu from comment #4) > When I compiled __cxxabiv1::__cxa_throw, which is a noreturn function in > libstdc++-v3/libsupc++/eh_throw.cc not to save callee-saved registers, > most of C++ exception tests crashed. Can you tell how you compiled it? Thanks in advance!
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #9 from Lukas Grätz --- (In reply to H.J. Lu from comment #8) > (In reply to Lukas Grätz from comment #7) > > (In reply to H.J. Lu from comment #4) > > > When I compiled __cxxabiv1::__cxa_throw, which is a noreturn function in > > > libstdc++-v3/libsupc++/eh_throw.cc not to save callee-saved registers, > > > most of C++ exception tests crashed. > > > > Can you tell how you compiled it? Thanks in advance! > > I have a patch to fix it. Please try users/hjl/pr113312/gcc-13 branch: > > > For your testcase, I got Well it is not my testcase. But I added backtracing and observed that the printed backtrace is unchanged with your patch. The new no_return_to_caller(): void __attribute__((noreturn)) no_return_to_caller(int a, int b, int c, int d) { LOOP_BODY; #define BT_BUF_SIZE 100 void *buffer[BT_BUF_SIZE]; backtrace_symbols_fd(buffer, backtrace(buffer, BT_BUF_SIZE), STDOUT_FILENO); while (1); } What I observed from the assembly is that %rbp is not saved, whereas %rip and %rsp are still implicitly saved by the call instruction. But since glibc's backtrace implementation does not use %rbp, this is fine. Some amateur speculation, just ignore it: I don't know whether %rbp is the source of the failed C++ test cases, which also do some stack unwinding. After looking in the System V Abi specification I am still unsure whether stack unwinding relies on %rbp or not. Perhaps there is an unnecessary dependency on %rbp or a missing "-fno-omit-frame-pointer" somewhere in the gcc internals that causes the problem.
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #11 from Lukas Grätz --- (In reply to H.J. Lu from comment #10) > The C++ test issue is caused by missing callee-saved registers for > exception supports in noreturn functions in libstdc++. I fixed it by > keeping callee-saved registers when exception is enabled. > > Backtrace with %rbp is unrelated to this. Gcc will skip %rpb without > -fno-omit-frame-pointer. Great! Then I guess there is no pitfall in your patch.
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #12 from Lukas Grätz --- (In reply to H.J. Lu from comment #10) > The C++ test issue is caused by missing callee-saved registers for > exception supports in noreturn functions in libstdc++. I fixed it by > keeping callee-saved registers when exception is enabled. I guess that exception throwing needs callee-saved registers, because it uses stack unwinding to do something very similar to a return. void f1(void) { CODE, compiler translation uses callee-saved %r12 f2(); CODE, compiler translation uses callee-saved %r12 } void f2(void) { f3(); } void f3(void) { CODE, compiler translation uses callee-saved %r12 f4(); while(1); } void f4(void) { CODE, uses loop unwinding functions a) restores all callee-saved registers in f3(), f2() b) restores %rsp and %rip from stack of f2() unreachable(); } In effect, b) is a return from the call f2() in f1(), although it happens in f4(). %r12 needs only to be saved in f1() and f3(). Gcc with -O2 would do that. However, with your patch, %r12 would not be saved in f3() anymore. This can lead to crashing in the second CODE block in f1(). The solution should require __attribute__((nothrow)) in addition to noreturn in your optimization patch. The b) in f4() should/would be treated as a throw. So none of f1(), f2(), f3() should have the attribute nothrow. So in the example of this report, the signature of value() should be modified to: extern __attribute__((nothrow)) unsigned value(int i, int j, int k); Only then it is safe to skip saving callee-saved registers. "nothrow" should also be added to bar() and fn() in your test case pr38534-2.c.
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #13 from Lukas Grätz --- (In reply to Lukas Grätz from comment #12) > CODE, uses loop unwinding functions >a) restores all callee-saved registers in f3(), f2() >b) restores %rsp and %rip from stack of f2() I meant stack unwinding. f3() and f2() can be in separate compilation units, it needs is ".cfi_offset REGISTER, OFFSET" from the elf (also in the generated assembly).
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #14 from Lukas Grätz --- Never mind my above comments. I just realized that attribute nothrow has no effect in C, unless -fexceptions. So nothrow is not needed (only -fno-exceptions). Furthermore, most noreturn functions throw in C++, so there would be little potential optimization when exceptions are enabled. What puzzles me, is that functions like exit() have different signatures in C and C++. With "gcc -E -fexceptions somefile.cc" I get extern void exit (int __status) throw () __attribute__ ((__noreturn__)); in C++ and in C I get with "gcc -E -fexceptions somefile.c" extern void exit (int __status) __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__noreturn__)); , although exceptions are explicitly enabled in both cases. But I guess this is a problem in Glibc, not GCC. I will really shut up now, promise!
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #25 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #19) > (In reply to H.J. Lu from comment #18) > > (In reply to Jakub Jelinek from comment #17) > > > E.g. shouldn't it at least be disabled for -O0 and -Og and shouldn't we > > > > We can disable this for -O0 and -Og. > > I think we should go for that. > This is independent from debugging, but I thought the patch was only meant for -O3. Have you thought about the following situation: Compile with gcc -O1 (and --fno-exception is implicit): library.c void __attribute__((noreturn)) foo(void (*bar)(void)) { ... bar(); while (1); } - Compile with g++ (and -fexception is implicit): app.c++ #include extern void foo(void (*bar)(void)); extern void bar_throws_exception(void) throw (); int main() { ... try { foo(bar_throws_exception); } catch (const std::exception& e) { ... } } -- It is not hart to fill the ... to make it use some callee-saved registers (e.g. with LOOP_BODY as in this issue report). And then the program would crash. One might argue that either the library.c is to blame for the missing -fexceptions? Or that the app.c++ is to blame because it should not call foo with an argument function that might throw an exception? I am unsure if the C++ standard actually forbids calling C library functions with argument functions that might throw an exception. So I think it would be safer to restrict the patch to -O3. But I really don't know much about this.
[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534 --- Comment #27 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #26) > (In reply to Lukas Grätz from comment #25) > > (In reply to Jakub Jelinek from comment #19) > > > (In reply to H.J. Lu from comment #18) > > > > (In reply to Jakub Jelinek from comment #17) > > > > > E.g. shouldn't it at least be disabled for -O0 and -Og and shouldn't > > > > > we > > > > > > > > We can disable this for -O0 and -Og. > > > > > > I think we should go for that. > > > > > > > This is independent from debugging, but I thought the patch was only meant > > for -O3. Have you thought about the following situation: > > Throwing an exception through -fno-exceptions code is UB, don't do that. Thanks for the info! I guess this UB is implicit in GCC's documentation for "-fexceptions": "However, you may need to enable this option when compiling C code that needs to interoperate properly with exception handlers written in C++." I would suggest to remove the "may" and to be more clear: "However, you need to enable this option when compiling C code that could (implicitly) propagate an exception (from C++) to an exception handler written in C++."