from:"lukas.graetz\-\-\- via Gcc\-bugs"

[Bug rtl-optimization/10837] noreturn attribute causes no sibling calling optimization

2024-02-11 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837

--- Comment #17 from Lukas Grätz  ---
(In reply to Xi Ruoyao from comment #16)
> (In reply to gooncreeper from comment #15)
> > May I suggest we just add something like __attribute__((trace)) for the
> > special abort case? Noreturn was added for code optimization after all, not
> > for backtracing.
> 
> It will break any attempts to debug an abort until the libc headers are
> updated to use __attribute__((trace)).

"any attempts"? We could simply use the gdb debugger and ignore the backtrace.
In comparison, the backtrace is a rather restricted debugging instrument.

If there are applications that really depend on GCC's backtrace, this should be
the reason to keep the current behaviour.

> 
> Note that in GCC noreturn has been added far before the WG14 _Noreturn paper
> (even this ticket predates the WG14 paper), so the rationale in the paper
> may not apply.

Backtracing functionality is highly platform dependent, so there is no surprise
that the C standard cannot guarantee anything about it.

> 
> In practice most _Noreturn functions are abort, exit, ..., i.e. they are
> only executed one time so optimizing against a cold path does not help much.
> I don't think it's a good idea to encourage people to construct some fancy
> code by a recursive _Noreturn function (why not just use a loop?!)

... and why not just if and goto? Because it is considered good programming
practice to structure source code into functions (not to long) and loops. If a
function gets too big, GCC might not optimize it well.

>  And if
> you must write such fancy code anyway IMO musttail attribute (PR83324) will
> be a better solution.

I agree.

[Bug rtl-optimization/10837] noreturn attribute causes no sibling calling optimization

2024-02-14 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837

--- Comment #18 from Lukas Grätz  ---
On another thought: I think something like -fignore-backtrace could be a
reasonable optimization flag (enabled by default for -O4). By ignoring the
backtrace we could do other optimizations on size and speed, like in this
ticket and duplicates.

There are use cases for that, see some of the duplicate tickets. For example in
PR56165, they didn't want to support any debugging at all. And even if you want
debugging, you might want to disregard backtraces and use a more sophisticated
debugging device. This is independent from attribute musttail, with
-fignore-backtrace we would leave GCC more freedom to do optimization.

[Bug rtl-optimization/10837] noreturn attribute causes no sibling calling optimization

2024-02-26 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837

--- Comment #20 from Lukas Grätz  ---
(In reply to Petr Skocik from comment #19)
> IMO(In reply to Xi Ruoyao from comment #16)
>  
> > In practice most _Noreturn functions are abort, exit, ..., i.e. they are
> > only executed one time so optimizing against a cold path does not help much.
> > I don't think it's a good idea to encourage people to construct some fancy
> > code by a recursive _Noreturn function (why not just use a loop?!)  And if
> > you must write such fancy code anyway IMO musttail attribute (PR83324) will
> > be a better solution.
> 
> There's also longjmp, which may not be all that super cold and may be
> executed multiple times. And while yeah, nobody will notice a single call vs
> jmp time save against a process spawn/exit, for a longjmp wrapper, it'll
> make it a few % faster (as would utilizing _Noreturn attributes for better
> register allocation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097,
> which would also save a bit of codesize too). Taillcalls can also save a bit
> of codesize if the target is near.


Just to emphasize, tail call optimization is not just for speed. It is
essential to avoid waste of stack space. Especially, to avoid potential stack
overflows, it should _not_ be necessary to replace all recursions with loops,
as Xi Ruoyao suggests. Ah, and I also think that recursions in C is not fancy
(anymore), since everyone expects the compiler to do sibcall or similar
optimizations. Noreturn functions are the exception for that. So it would be
consequent indeed to do sibcall optimization for noreturn functions, too!

Personally, I would be satisfied with the new attribute musttail to enforces
tail calls whenever necessary (given that this will be available for C, not C++
only). But speed-wise, musttail might not have the desired effect. It is meant
for preserving stack space.

---

Following Petr Skocik, I quick-tested on my computer:

= longjmp_wrapper.c =
#include 

__attribute__((noreturn))
void longjmp_wrapper(jmp_buf env, int val) {
longjmp(env, val);
}

= longjmp_main.c 
#include 
#include 

__attribute__((noreturn))
void longjmp_wrapper(jmp_buf env, int val);

int main(void) {
jmp_buf env;
for (int i = 0; i < INT_MAX; i++) {
if (setjmp(env) == 0) {
longjmp_wrapper(env, 1);
}
}
}
=

After compiling with

$ gcc -O3 -m32 -c -S longjmp_wrapper.c -o longjmp_wrapper.S

I copied and manually modified the generated longjmp_wrapper.S as follows:

9,15c9
<   subl$20, %esp
<   .cfi_def_cfa_offset 24
<   pushl   28(%esp)
<   .cfi_def_cfa_offset 28
<   pushl   28(%esp)
<   .cfi_def_cfa_offset 32
<   calllongjmp
---
>   jmp longjmp


Then I compiled both versions with longjmp_main.c, again with -m32. Measured
with "time", the sibcall and unmodified version took around 23.5 sec and 24.5
sec on my computer. So around 4 % improvement for 32 bit x86. For 64 bit x86,
both took around 18 secs without noticeable speed difference (perhaps because
both arguments are passed in registers instead of stack by 64 bit calling
conventions).

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #29 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #28)
> (In reply to Lukas Grätz from comment #9)
> > Well it is not my testcase. But I added backtracing and observed that the
> > printed backtrace is unchanged with your patch. The new
> > no_return_to_caller():
> 
> You haven't tried hard enough.


That might be true.


> Consider the testcase I've posted to the mailing list, built with -Og -g.

> The gcc trunk hits the backtrace not possible problem because rbp is
> 
> clobbered and needed in upper frame CFA computation:


Yes, when a backtrace is based on rbp, one needs -fno-omit-frame-pointer. I
trusted comment #10 here, as it made sense.


> And in the patched gcc (with PR114116 patch to save bp register) backtrace
> works but several of the values are bogus:  

> #2  0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44,
> d=d@entry=-559038737, e=e@entry=-559038737, f=f@entry=-559038737, g=48,
> h=49) at /tmp/1.c:38


glibc's backtrace() function and friends only reports function names and
addresses. This looks like the gdb bt command. I admit, I did not take a proper
look into that before.

I belief this could and should be somehow be fixed by adding DWARF info that
certain callee-saved registers (= the function parameter values) were
overwritten. The corrected backtrace could look something like this:


#2  0x004011d2 in baz (a=42, b=43, c=44, d=,
e=, f=, g=48, h=49) at /tmp/1.c:38


Some parameters would be , and this would be fine because the
code was partially compiled with -O2. It is not unusual to have 
parameter values in gdb's bt.


> So, I think we should limit this to -fno-unwind-tables or maybe
> -mcmodel=kernel.


Now I am confused. The optimization is limited to -fexceptions. And the
documentation of -funwind-tables says "Similar to -fexceptions, except". So
shouldn't -funwind-tables behave similar to -fexceptions? I don't see anything
kernel-specific here.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #31 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #30)
> (In reply to Lukas Grätz from comment #29)
> > Yes, when a backtrace is based on rbp, one needs -fno-omit-frame-pointer. I
> > trusted comment #10 here, as it made sense.
> 
> See PR114116.
> 
> > glibc's backtrace() function and friends only reports function names and
> > addresses. This looks like the gdb bt command. I admit, I did not take a
> > proper look into that before.
> 
> Yes, it is gdb bt.  And it is what people heavily rely on for debugging, if
> something fails an assertion or aborts etc., they want to figure out why.
> 

True.

> > I belief this could and should be somehow be fixed by adding DWARF info that
> > certain callee-saved registers (= the function parameter values) were
> > overwritten. The corrected backtrace could look something like this:
> 
> That can be arranged by emitting those .cfi_undefined directives...
>  
> > #2  0x004011d2 in baz (a=42, b=43, c=44, d=,
> > e=, f=, g=48, h=49) at /tmp/1.c:38
> 
> ... but really will not help users to debug/fix their code.


Even when I compile a simple program with gcc -O2 -g:


#include 
int main(int argc, char** argv) {
abort();
}


I still get an "argc=":

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x77dcd859 in __GI_abort () at abort.c:79
#2  0x00401046 in main (argc=, argv=) at
simple.c:4


Yes, for a better debugging, it would be nice if optimised code would just not
be optimised... But this goes against optimization.


> > > So, I think we should limit this to -fno-unwind-tables or maybe
> > > -mcmodel=kernel.
> > Now I am confused. The optimization is limited to -fexceptions. And the
> > documentation of -funwind-tables says "Similar to -fexceptions, except". So
> > shouldn't -funwind-tables behave similar to -fexceptions? I don't see
> > anything kernel-specific here.
> 
> Given that even with -fno-asynchronous-unwind-tables (or -fno-unwind-tables)
> gcc emits
> the unwind info, just not into .eh_frame but .debug_frame, we shouldn't
> disable it
> just when not emitting .eh_frame, but should just disable it always.
> There is a reason why it has been rejected years ago.
> If anything, guard it with some non-default -m* option and explain the
> consequences to users if they use it.  Still, the guarding IMHO should be
> done on top of the PR114116
> change, because having random crashes from backtrace or gdb bt even when
> user asked for it is a bad idea.


Yes, it is a bad idea to have crashes from backtrace or gdb. But when this is
only about , I don't see the point about disabling it always.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #33 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #32)
> (In reply to Lukas Grätz from comment #31)
> > Even when I compile a simple program with gcc -O2 -g:
> > 
> > #include 
> > int main(int argc, char** argv) {
> > abort();
> > }
> > 
> > 
> > I still get an "argc=":
> 
> Sure, debugging info in optimized code is best effort.
> 
> > Yes, for a better debugging, it would be nice if optimised code would just
> > not be optimised... But this goes against optimization.
> 
> The significant difference between other optimizations and this one is
> that normal optimizations affect the debuggability of the optimized function.
> This one affects the debuggability of all callers as well, even if they are
> compiled in a way that should make them more debuggable.
> Normally, if debugging optimized code doesn't work out, one can simply
> rebuild that code with -O0 or -Og to make it more debuggable.
> Here one would also need to rebuild all the shared libraries it uses.

When the debugger is inside the debuggable -O0 or -Og compiled function, we
would see all parameters and current variable values. However, in the bt
example, we are in another function. So the parameters are only available at
best effort.

I just noticed that for my simple.c example above, I get "argc="
even with -Og. However, when breakpoint is somewhere else,

(gdb) break main
(gdb) run
(gdb) bt

I get the correct "argc=1". The same applies to your example with "break baz".
It is just not guaranteed that gdb is able to reconstruct function parameters
when we are in some other function.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #36 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #35)
> If I hand edit the gcc trunk + PR114116 patch assembly, add to bar
> + .cfi_undefined 3
> + .cfi_undefined 12
> + .cfi_undefined 13
> + .cfi_undefined 14
> + .cfi_undefined 15
> then bt in gdb shows
> #2  0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44,
> d=, 
> e=, f= reading variable: value has been optimized out>, g=48, h=49) at /tmp/1.c:38


I can confirm that. What bothers me, is the wording "d=" and not just "d=".


(gdb) run
Starting program: bar-artificial-mod 

Program received signal SIGABRT, Aborted.

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x77dcd859 in __GI_abort () at abort.c:79
#2  0x004011b1 in bar () at bar-artificial.c:30
#3  0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44,
d=,
e=,
f=,
g=48, h=49) at bar-artificial.c:38
#4  0x004012aa in qux () at bar-artificial.c:55
#5  0x004012e4 in main () at bar-artificial.c:62

(gdb) p a
No symbol "a" in current context.
(gdb) p b
No symbol "b" in current context.


> and everything in qux live across the call is  as well,
> (gdb) p $r12
> $10 = 
> etc. while without that
> (gdb) p a
> $1 = 
> (gdb) p b
> $2 = 
> (gdb) p c
> $3 = 
> (gdb) p d
> $4 = -559038737
> (gdb) p e
> $5 = -559038737
> (gdb) p f
> $6 = -559038737
> (gdb) p g
> $7 = -559038737
> (gdb) p h
> $8 = -559038737
> (gdb) p $r12
> $9 = 3735928559


Where did you set the breakpoint? When I set it somewhere in qux (after
a,b,c,... were initialized), I get conclusive results:


(gdb) break bar-artificial.c:52
Breakpoint 1 at 0x40124a: file bar-artificial.c, line 52.
(gdb) run
Breakpoint 1, qux () at bar-artificial.c:52
52corge (__builtin_alloca (foo (52)));
(gdb) p a
$1 = 42
(gdb) p b
$2 = 43
(gdb) p c
$3 = 44
(gdb) p d
$4 = 45
(gdb) p e 
$5 = 46
(gdb) p f
$6 = 47
(gdb) p g
$7 = 48
(gdb) p h
$8 = 49
(gdb) p $r12
$9 = 46

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-27 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #38 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #37)
> Nowhere, just run and when it stops due to abort, just up several times
> until reaching the appropriate frame.


I see, this gives me:


(gdb) frame 4
#4  0x004012aa in qux () at bar-artificial.c:55
55baz (a, b, c, d, e, f, g, h);
(gdb) p a
$1 = 42
(gdb) p b
$2 = 43
(gdb) p c
$3 = 44
(gdb) p d
$4 = 
(gdb) p e
$5 = 
(gdb) p f
$6 = 
(gdb) p g
$7 = 
(gdb) p h
$8 = 
(gdb) p $r12
$9 = 


I checked the dwarf:


$ llvm-dwarfdump bar-artificial-mod
[...]
0x009f:   DW_TAG_subprogram
DW_AT_external  (true)
DW_AT_name  ("qux")
DW_AT_decl_line (42)
DW_AT_prototyped(true)
DW_AT_low_pc(0x004011d2)
DW_AT_high_pc   (0x004012db)
DW_AT_frame_base(DW_OP_call_frame_cfa)
DW_AT_call_all_calls(true)
DW_AT_sibling   (cu + 0x02f0)
[...]
0x00ee: DW_TAG_variable
  DW_AT_name("d")
  DW_AT_decl_line   (47)
  DW_AT_decl_column (0x07)
  DW_AT_type(cu + 0x0060 "int")
[...]
$ objdump -W bar-artificial-mod
[...]
 <2>: Abbrev Number: 2 (DW_TAG_variable)
   DW_AT_name: d
   DW_AT_decl_file   : 1
   DW_AT_decl_line   : 47
   DW_AT_decl_column : 7
   DW_AT_type: <0x60>
   DW_AT_location: 0x6e (location list)
   DW_AT_GNU_locviews: 0x6a
[...]
Contents of the .debug_loclists section:
[...]
006e v000 v000 views at 006a for:
 00401216 0040121f (DW_OP_reg0 (rax))
0075 v000 v000 views at 006c for:
 0040121f 004012d1 (DW_OP_reg3 (rbx))
007c 
[...]


The problem is that we are not within the loclist range. So in principle, we
cannot get the value of the variable, the variable is just not visible. But
since gdb is very clever, it searched whether either the value of rax or rbx
from within the loclist range remained somewhere. And apparently, for the
version without the patch, the value of rbx was saved. For the optimized
version with the patch, rbx was not saved, so the value could not been
reconstructed.

In my opinion, it is just fancy that gdb can do that.


Coming back to the "simple.c" example:


$ objdump -W simple
[...]
 <2>: Abbrev Number: 3 (DW_TAG_formal_parameter)
   DW_AT_name: (indirect string, offset: 0x85): argc
   DW_AT_decl_file   : 1
   DW_AT_decl_line   : 2
   DW_AT_decl_column : 14
   DW_AT_type: <0x35>
   DW_AT_location: 0x10 (location list)
   DW_AT_GNU_locviews: 0xc
[...]
Contents of the .debug_loclists section:
[...]
0010 v000 v000 views at 000c for:
 00401126 0040112e (DW_OP_reg5 (rdi))
0015 v000 v000 views at 000e for:
 0040112e 0040112f (DW_OP_entry_value: (DW_OP_reg5
(rdi)); DW_OP_stack_value)
001d 
[...]


And rdi was saved nowhere, regardless of the patch. So gdb could not
reconstruct the value of argc.

[Bug debug/114144] New: Variables optimized out by -Og

2024-02-27 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114144

Bug ID: 114144
   Summary: Variables optimized out by -Og
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lukas.gra...@tu-darmstadt.de
  Target Milestone: ---

-Og seems to  some variable values: On x86-64, function
parameters are lost after calling other functions, they were not saved (to the
stack). This could be undesired when debugging. Consider the following example:

= foo.c 
int foo (int i) {
return i + 1;
}

= caller.c =
extern int foo(int);
int main(int argc, char **argv) {
int i = foo(argc);
int v = foo(i);
return v;
}


   Compile on x86-64:

$ gcc -Og -g caller.c foo.c -o caller

   Debug with breakpoint after calling foo():

$ gdb caller

Reading symbols from caller...
(gdb) break caller.c:5
Breakpoint 1 at 0x401116: file caller.c, line 5.
(gdb) run
Starting program: /home/lukas/test/caller 

Breakpoint 1, main (argc=, argv=) at caller.c:5
5   return v;
(gdb) print argc
$1 = 
(gdb) print i
$2 = 
(gdb) print v
$3 = 3

---

For 32-bit x86 with "-m32 -Og -g" we would get argc and argv because the
calling conventions put them on the stack and not a caller-saved variable.
However, the variable i would still be  at the breakpoint.

---
EXPECTED RESULT by compiling the same with -O0:

$ gcc -O0 -g caller.c foo.c -o caller

$ gdb caller

(gdb) break caller.c:5
Breakpoint 1 at 0x40112f: file caller.c, line 5.
(gdb) run
Starting program: /home/lukas/test/caller

Breakpoint 1, main (argc=1, argv=0x7fffdc98) at caller.c:5
5   return 0;
(gdb) print argc
$1 = 1
(gdb) print i
$2 = 2
(gdb) print v
$3 = 3
---

This is not a problem of the debugger gdb: By looking at the DWARF debugging
info, it turns out that argc has indeed been optimised out:


$ objdump -W caller
[...]
 <2><6d>: Abbrev Number: 1 (DW_TAG_formal_parameter)
<6e>   DW_AT_name: (indirect string, offset: 0x11): argc
<72>   DW_AT_decl_file   : 1
<72>   DW_AT_decl_line   : 2
<72>   DW_AT_decl_column : 14
<73>   DW_AT_type: <0x44>
<77>   DW_AT_location: 0x10 (location list)
<7b>   DW_AT_GNU_locviews: 0xc
[...]
Contents of the .debug_loclists section:
[...]
0010 v000 v000 views at 000c for:
 00401106 0040110e (DW_OP_reg5 (rdi))
0015 v000 v000 views at 000e for:
 0040110e 0040111b (DW_OP_entry_value: (DW_OP_reg5
(rdi)); DW_OP_stack_value)
001d 
[...]


Our breakpoint at location 0x401116 is inside the second range of the loclist.
However, we cannot compute "DW_OP_entry_value: (DW_OP_reg5 (rdi))" at 0x401116,
since it refers to a state at a previous location (the value of rdi at the
subprogram entry). Also, from the disasambly you can clearly see that
caller-saved registers %edi and %eax are not saved (neither to the stack nor to
callee-saved registers) before calling foo:

$ objdump -d caller
[...]
00401106 :
  401106:   48 83 ec 08 sub$0x8,%rsp
  40110a:   e8 0c 00 00 00  callq  40111b 
  40110f:   89 c7   mov%eax,%edi
  40:   e8 05 00 00 00  callq  40111b 
  401116:   48 83 c4 08 add$0x8,%rsp
  40111a:   c3  retq   
[...]


And if you don't like breakpoints, you could modify caller.c as follows to
automatically break by calling abort:

= caller2.c =
#include 
extern int foo(int);
int main(int argc, char **argv) {
int i = foo(argc);
int v = foo(i);
abort();
}
=

$ gcc -Og -g caller2.c foo.c -o caller2
$ gdb ./caller2
(gdb) run
Program received signal SIGABRT, Aborted.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x77dcd859 in __GI_abort () at abort.c:79
#2  0x0040113b in main (argc=, argv=)
at caller2.c:6

Here, too, we have argc=.


--
According to the documentation of -Og:

"-Og should be the optimization level of choice for the standard
edit-compile-debug cycle, offering a reasonable level of optimization while
maintaining fast compilation and a good debugging experience."

"It is a better choice than -O0 for producing debuggable code because some
compiler passes that collect debug information are disabled at -O0."

But for many cases, -O0 currently seems to be the better choice. Because -O0
saves all values to the stack, they will not be . If -Og is
working as

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-28 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #40 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #30)
> (In reply to Lukas Grätz from comment #29)
> > I belief this could and should be somehow be fixed by adding DWARF info that
> > certain callee-saved registers (= the function parameter values) were
> > overwritten. The corrected backtrace could look something like this:
> 
> That can be arranged by emitting those .cfi_undefined directives...
>  
> > #2  0x004011d2 in baz (a=42, b=43, c=44, d=,
> > e=, f=, g=48, h=49) at /tmp/1.c:38
> 
> ... but really will not help users to debug/fix their code.
> 

It seems that the reason for  is ultimately -Og, not this patch.
See Bug 78685. When compiling and debugging your program with -O0 instead,
there is not a single .

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-28 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #42 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #41)
> (In reply to Lukas Grätz from comment #40)
> > It seems that the reason for  is ultimately -Og, not this
> > patch. See Bug 78685.
> 
> No.  When PR78685 would be fixed by adding artificial hidden uses of
> variables at the end of their scopes, this bug would trigger far more often.
> The vars would be live across the calls, so if there would be callee-saved
> registers available, the compiler
> would use them to hold the variables across the calls.  And this bug would
> break that.

It could be done that way. But I think a better fix for PR78685 would be to
save the function parameter values to the stack (and than this problem will not
trigger that often). For the following reasons:

(1) Timing for push and mov instructions are similar, so the execution speed
wouldn't be much affected.

(2) A callee needs to somehow restore callee-saved registers, but only if it
returns. So the calling conventions cannot guarantee that callee-saved
registers are saved somewhere for noreturn functions. But of course, if you
disregard this optimization, this would not trigger that often.

(3) Potential register pressure when saving additional variables to
callee-saved registers: If the execution itself no longer needs the value of a
function parameter, there is no need to hold it in a (callee-saved) register
accross calls for a quick access. The stack is sufficient for accessing the
values with the debugger.

(4) The entry values of function parameters should be more helpful, not some
later values. E.g., for

int foo(int i) {
if (i == 42) { h(); }
i = 7;
bar();
}

we would be more interested in the original value of "i" and not the later
value "i = 7" as saved by "artificial hidden uses of variables at the end of
their scopes". By saving original values to the stack before they are modified,
we can keep inspecting the original values.

The helpful backtrace from within bar() could be:

#1   bar()
#2   foo(i@entry=42)

The other version would be a bit counter-intuitive, when the argument to foo
really was i=42:

#1   bar()
#2   foo(i=7)

Btw., function parameters are not normally part of the backtrace (this is just
a nice gdb feature), see Wikipedia:

https://en.wikipedia.org/wiki/Stack_trace

> Anyway, I've posted 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646649.html
> patch which will not revert the #c15/#c24 changes, but guard them with a
> non-default option.  People who don't care about the harder debugging can
> use that option in their code, but widely used shared libraries with
> noreturn entrypoints will no longer screw up the debugging for all the
> packages that use them.

Yes, it took me long, but I agree, it would be better to not worsen debugging
experience.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-28 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #43 from Lukas Grätz  ---
(In reply to Lukas Grätz from comment #42)
> (In reply to Jakub Jelinek from comment #41)
> > 
> > No.  When PR78685 would be fixed by adding artificial hidden uses of
> > variables at the end of their scopes, this bug would trigger far more often.
> > The vars would be live across the calls, so if there would be callee-saved
> > registers available, the compiler
> > would use them to hold the variables across the calls.  And this bug would
> > break that.
> 
> It could be done that way. But I think a better fix for PR78685 would be to
> save the function parameter values to the stack (and than this problem will
> not trigger that often). For the following reasons:
> 


Just to be complete with the arguments:

(5) Artificial hidden uses of variables at the end of their scopes would not
always help when variables are overwritten. For example:

int main (int argc, char **argv) {
if (argc == 42) { h(); }
might_not_return(0);
argc = bar();
// here would be the hidden use of argc and argv
}

The "artificial hidden use" approach would only save the last value of argc,
here the result of bar() in line 4 and not the argument argc. The argument
value of argc is not used from line 3 on. So that approach would still produce
a backtrace with argc=, something like:

#1 might_not_return(i=0)
#2 main (argc=, argv=0x7fffe0)

(6) When the goal is just to have a more helpful gdb bt output, then we don't
need to save any variables other than function parameters. In the original
example in Bug 78685 and Comment 28 here, this seemed to be the main goal, to
get gdb bt more conclusive. If interested in other variable values, too, -O0
might be better then trying hard to patch -Og to save all variable values.

(7) Bug 78685 is for x86-64 with -Og. For 32 bit x86 with -Og, we don't run
into that problem: there are no  function parameters, since they
are already on the stack by the 32 bit calling conventions. So saving
parameters on the stack for -Og on x86-64 and similar targets without
stack-parameters would just be consequent.

[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc

2024-02-29 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116

Lukas Grätz  changed:

   What|Removed |Added

 CC||lukas.graetz@tu-darmstadt.d
   ||e

--- Comment #7 from Lukas Grätz  ---
(In reply to H.J. Lu from comment #6)
> (In reply to Jakub Jelinek from comment #5)
> > Yeah.  Not to mention, one can call backtrace even if -g0; you just don't
> > get nice names for the addresses.  Without the patch you get crashes in the
> > unwinder when doing backtrace.
> 
> Should we generate REG_CFA_UNDEFINED for unsaved callee-saved registers to
> help unwinder:
> 
> https://patchwork.sourceware.org/project/gcc/list/?series=30327

Yes. Also for gdb this is needed.

Perhaps I did something wrong. On my computer, I could get the first patch
working to save rbp, I also applied the patch which should omit the
.cfi_undefined. But somehow, I still not get .cfi_undefined for any of the
examples.


$ ./gcc/host-x86_64-pc-linux-gnu/gcc/cc1 -O3
gcc/gcc/testsuite/gcc.target/i386/pr38534-7.c -o pr38534-7.S

$ cat pr38534-7.S
[...]
no_return_to_caller:
.LFB0:
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movl$array+67108860, %eax
xorl%r13d, %r13d
[...]


The ".cfi_undefined 13" is still missing...

[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc

2024-02-29 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116

--- Comment #9 from Lukas Grätz  ---
(In reply to H.J. Lu from comment #8)
> (In reply to Lukas Grätz from comment #7)
> > (In reply to H.J. Lu from comment #6)
> > > (In reply to Jakub Jelinek from comment #5)
> > > > Yeah.  Not to mention, one can call backtrace even if -g0; you just 
> > > > don't
> > > > get nice names for the addresses.  Without the patch you get crashes in 
> > > > the
> > > > unwinder when doing backtrace.
> > > 
> > > Should we generate REG_CFA_UNDEFINED for unsaved callee-saved registers to
> > > help unwinder:
> > > 
> > > https://patchwork.sourceware.org/project/gcc/list/?series=30327
> > 
> > Yes. Also for gdb this is needed.
> > 
> > Perhaps I did something wrong. On my computer, I could get the first patch
> > working to save rbp, I also applied the patch which should omit the
> > .cfi_undefined. But somehow, I still not get .cfi_undefined for any of the
> > examples.
> > 
> > 
> > $ ./gcc/host-x86_64-pc-linux-gnu/gcc/cc1 -O3
> > gcc/gcc/testsuite/gcc.target/i386/pr38534-7.c -o pr38534-7.S
> > 
> > $ cat pr38534-7.S
> > [...]
> > no_return_to_caller:
> > .LFB0:
> > .cfi_startproc
> > pushq   %rbp
> > .cfi_def_cfa_offset 16
> > .cfi_offset 6, -16
> > movl$array+67108860, %eax
> > xorl%r13d, %r13d
> > [...]
> > 
> > 
> > The ".cfi_undefined 13" is still missing...
> 
> It is generated only when -g is used.


Not on my computer. When I used -g I got:


no_return_to_caller:
.LFB0:
.loc 1 16 1 view -0
.cfi_startproc
.loc 1 17 3 view .LVU1
.loc 1 18 3 view .LVU2
.LVL0:
.loc 1 18 26 discriminator 1 view .LVU3
.loc 1 16 1 is_stmt 0 view .LVU4
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movl$array+67108860, %eax
.loc 1 21 31 view .LVU5
xorl%r13d, %r13d
.loc 1 16 1 view .LVU6


Still no .cfi_undefined 13. In principle, it should also be generated without
-g, as the rest of .cfi_offset and friends.

[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc

2024-02-29 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116

--- Comment #11 from Lukas Grätz  ---
(In reply to H.J. Lu from comment #10)
> (In reply to Lukas Grätz from comment #9)
> 
> > 
> > Not on my computer. When I used -g I got:
> > 
> > 
> > no_return_to_caller:
> > .LFB0:
> > .loc 1 16 1 view -0
> > .cfi_startproc
> > .loc 1 17 3 view .LVU1
> > .loc 1 18 3 view .LVU2
> > .LVL0:
> > .loc 1 18 26 discriminator 1 view .LVU3
> > .loc 1 16 1 is_stmt 0 view .LVU4
> > pushq   %rbp
> > .cfi_def_cfa_offset 16
> > .cfi_offset 6, -16
> > movl$array+67108860, %eax
> > .loc 1 21 31 view .LVU5
> > xorl%r13d, %r13d
> > .loc 1 16 1 view .LVU6
> > 
> > 
> > Still no .cfi_undefined 13. In principle, it should also be generated
> > without -g, as the rest of .cfi_offset and friends.
> 
> Did you apply my patch?  I got
> 
>   .globl  no_return_to_caller
>   .type   no_return_to_caller, @function
> no_return_to_caller:
> .LFB0:
>   .file 1 "pr38534-1.c"
>   .loc 1 16 1 view -0
>   .cfi_startproc
>   .loc 1 17 3 view .LVU1
>   .loc 1 18 3 view .LVU2
> .LVL0:
>   .loc 1 18 26 discriminator 1 view .LVU3
>   .loc 1 16 1 is_stmt 0 view .LVU4
>   subq$24, %rsp
>   .cfi_undefined 15
>   .cfi_undefined 14
>   .cfi_undefined 13
>   .cfi_undefined 12
>   .cfi_undefined 6
> ...

I applied it, double checked, make distclean, configure, make again.

But your result seems different. Have you applied Jakub Jelinek's patch to save
%rbp? I applied both patches. Perhaps there was some subtle merge-conflict with
the two patches.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-29 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #44 from Lukas Grätz  ---
(In reply to Tom Tromey from comment #39)
> (In reply to Lukas Grätz from comment #36)
> 
> > > #2  0x004011d2 in baz (a=a@entry=42, b=b@entry=43, c=c@entry=44,
> > > d=, 
> > > e=, f= > > reading variable: value has been optimized out>, g=48, h=49) at 
> > > /tmp/1.c:38
> > 
> > 
> > I can confirm that. What bothers me, is the wording "d= > out>" and not just "d=".
> 
> Could you file a gdb bug about this?  Preferably with some
> kind of test case?

Done. See:

https://sourceware.org/bugzilla/show_bug.cgi?id=31436

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-29 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #45 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #28)
> (In reply to Lukas Grätz from comment #9)
> > Well it is not my testcase. But I added backtracing and observed that the
> > printed backtrace is unchanged with your patch. The new
> > no_return_to_caller():
> 
> You haven't tried hard enough.
> Consider the testcase I've posted to the mailing list, built with -Og -g.
> It is artificial in that register pressure is increased artificially rather
> than coming from meaningful code, noipa attribute is used heavily instead of
> functions being too large or in different TUs, and optimize attribute used
> instead of the noreturn function sitting in a different library, built there
> with -O2, while user program say with -Og.


I found a

movq%rsp, %rbp
.cfi_def_cfa_register 6

in the assembler output of your example code in function qux(). After that, the
value of %rsp is only reconstructable with %rbp. Because there is some alloca
with unkown size at compile time in qux(), we could not reconstruct %rsp
otherwise. So I was ultimately wrong, and the value of %rbp would be needed to
construct the backtrace in some cases. So the only option to still get the
backtrace is to apply your patch to save %rbp (given that .cfi_def_cfa_register
always points to %rbp).

But I guess you know that already.

[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc

2024-02-29 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116

--- Comment #13 from Lukas Grätz  ---
(In reply to H.J. Lu from comment #12)
> (In reply to Lukas Grätz from comment #11)
> > 
> > I applied it, double checked, make distclean, configure, make again.
> > 
> > But your result seems different. Have you applied Jakub Jelinek's patch to
> 
> No.
> 
> > save %rbp? I applied both patches. Perhaps there was some subtle
> > merge-conflict with the two patches.
> 
> Please try just my patch.

Thanks, that worked!

[Bug target/114116] [14 Regression] Broken backtraces in bootstrapped x86_64 gcc

2024-02-29 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114116

--- Comment #14 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #2)
> Created attachment 57545 [details]
> gcc14-pr114116.patch
> 
> This seems to fix it, so far tested just on the small testcase, back to the
> expected backtrace there.

As I said in PR 38534, comment [1], the rsp could be saved to rbp due to an
unknown-sized stack-frame:

movq%rsp, %rbp
.cfi_def_cfa_register 6

Therefore, if we want the backtrace in such situations, we would need to save
rbp, too, as your patch does. The patch might even not be enough, if there is
the possibility that we could .cfi_def_cfa_register with a register other than
rbp/6. In that case, the patch can be ignored and it is left to disable the
optimization by default, as you already suggested, I think you already have a
patch for that.

H.J. Lu's patch to emit .cfi_undefined is needed in any case. Only that both
patches are currently incompatible.


There also seems to be a bug in libgcc/unwind-dw2.c:249, causing a SEGV when
register values are unavailable due to .cfi_undefined. This is already known,
as the comment there suggests. This happens during a call to glibc's
backtrace(), even though the registers are not needed for the backtrace (in
that case, gdb's backtrace is fine, glibc's backtrace crashes in libgcc). It
should be possible to print best-effort-traces without crashing, in fact,
calling backtrace() should never lead to a crash. Bug 103510 might be related
and this should be fixed independently.

Thanks for the work putting in this and I am sorry for the mess on my side!

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534#c45

[Bug c/111643] New: attribute((flatten)) with -O1 runs out of memory (killed cc1)

2023-09-29 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643

Bug ID: 111643
   Summary: __attribute__((flatten)) with -O1 runs out of memory
(killed cc1)
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lukas.gra...@tu-darmstadt.de
  Target Milestone: ---

Created attachment 56017
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56017&action=edit
C file

When I run

gcc -c -O0 runs_out_of_memory.i -o runs_out_of_memory.o

(see the attached .i file) everything is fine. But when I run

gcc -c -O1 runs_out_of_memory.i -o runs_out_of_memory.o

then I get:

gcc: fatal error: Killed signal terminated program cc1

Apparently, quickly runs out of memory. I have 16 GB ram and the program is
rather simple. I tested it with gcc versions 9.4.0, 5.1 and 13.2 (target
x86_64-linux-gnu) on ubuntu 20.04.

I believe the problem is the __attribute__((flatten)) on several methods.

How I created the source file: The code comes from busybox (file
coreutils/expr.c) and musl header files. Additionally, I replaced every
function 'name' with 'name_original' and added a wrapper with
__attribute__((flatten)), for later instrumentation (I did this with a script).
I used that attribute, because to reduce the overhead of the wrapper functions
and I believe this should be fine. My reason why I introduced the wrappers in
the first place was to allow a fine-grained instrumentation of these functions.

[Bug ipa/111643] attribute((flatten)) with -O1 runs out of memory (killed cc1)

2023-09-29 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643

--- Comment #3 from Lukas Grätz  ---
(In reply to Marc Glisse from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > I am 99% sure this is falls under don't do this as flatten inlines
> > everything it can that the function calls ...
> 
> Maybe people end up abusing flatten because we are missing a convenient way
> for a caller to ask that a call be inlined? From the callee, we can use
> always_inline (couldn't this be used on name_original in this testcase?),
> but from the caller... Here even a non-recursive version of flatten would
> have helped.

Yes, this was what I was searching for, but I found only flatten. Also, that
flatten is applied recursively is not mentioned in the documentation and it is
also not what I would expect.

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html

I don't want to always_inline name_original. What I want is to only inline
name_original when called by the wrapper function name, hence the flatten.
Because I replace every call to name with name_original where I don't want to
apply the instrumentation by the wrapper function name.

Thanks!

[Bug ipa/111643] attribute((flatten)) with -O1 runs out of memory (killed cc1)

2023-09-30 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643

--- Comment #4 from Lukas Grätz  ---
Sorry, just to clarify, whether I understood your two comments correctly.
Should foo() be inlined in the following example because flatten works
recursively?

void foo (void) {
// CODE
}

int bar_original (void) {
// CODE
foo();
// CODE
}

__attribute__((flatten))
int bar (void) {
// INSTRUMENTATION CAN GO HERE
return bar_original();
}

I thought that according to the documentation of flatten, foo() would not be
affected by the flatten attribute of bar(). It says: "For a function marked
with this attribute, every call inside this function is inlined, if possible."
The call to foo() is not directly inside the function bar(). Only if
bar_original() had also the __attribute__((flatten)), I would expect foo() to
be made inline in bar() because of recursive flatten. Of course, it could still
be inlined because some heuristics...

[Bug ipa/111643] attribute((flatten)) with -O1 runs out of memory (killed cc1)

2023-10-06 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643

--- Comment #9 from Lukas Grätz  ---
Thanks for everything, it seemed to be a misunderstanding from my side anyway
and the documentation fix should help others.

I am sorry for being silent, I was sick for a few days. As for my original
problem, I am thinking of opening a new report, because I realized there could
be another solution without flatten. To explain a bit more, we have
bar_original() and bar_new(), the latter should behave identical to the former
except one additional statement, the "instrumentation". Since the
instrumentation can be done in two assembler instructions only, the overhead of
bar_new() calling bar_original() is not negligible.

int bar_original (int x) { /* CODE */ }

unsigned int trace_buffer[512];
uint8_t trace_pos;

#define FUNCTION_NUMBER_bar 0x686
int bar_new (int x) {
trace_buffer[trace_pos++] = 0x686; // instrumentation
return bar_original(x);
}

My idea: Do not touch the stack inside bar_new() and replace the call in
bar_new() with a jump or better a fall-through to bar_original(). This is
possible, because both functions have the same signature. It could save around
4 instructions and some stack memory. I have a lot of such functions after my
instrumentation step.

I also wondered whether

int bar_alias (void) { return bar_original(); }

could be a portable alternative to attribute alias. Except that current GCC
does not translate it that way.

[Bug ipa/111643] attribute((flatten)) with -O1 runs out of memory (killed cc1)

2023-10-06 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643

--- Comment #11 from Lukas Grätz  ---
(In reply to Alexander Monakov from comment #10)
> (In reply to Lukas Grätz from comment #9)
> > I also wondered whether
> > 
> > int bar_alias (void) { return bar_original(); }
> > 
> > could be a portable alternative to attribute alias. Except that current GCC
> > does not translate it that way.
> 
> That's because function addresses are significant and so
> 
>   &bar_alias == &bar_original
> 
> must evaluate to false, but would be true for aliases.
> 
> In theory compilers could do better by introducing fall-through aliases:
> https://gcc.gnu.org/wiki/
> cauldron2019talks?action=AttachFile&do=view&target=fallthrough-aliases.pdf

Thanks a lot! I haven't thought about function addresses. Is there hope that
fall-through aliases get into gcc? Then my perhaps my instrumentation
fall-through would also be possible to implement.

[Bug ipa/111643] attribute((flatten)) with -O1 runs out of memory (killed cc1)

2023-10-06 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643

--- Comment #14 from Lukas Grätz  ---

(In reply to Andrew Pinski from comment #13)
> (In reply to Andrew Pinski from comment #12)
> > Gcc does have tail call optimization which should allow the instrumentation
> > with less overhead. Though tail call optimization happens at -O2 and above
> > only (by default).
> 
> The only improvement to this would be fall through alias which allows the
> removal of the jump to the other function. A direct non-conditional jump is
> usually predictable so the overhead should be small still.

Thanks! I thought that there was still some stack involved also causing some
overhead for every function call (in comparison to a pure non-conditional
jump). When I have time next week, I will try to look into that in detail.

[Bug c/111786] New: No tail recursion for simple program

2023-10-12 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111786

Bug ID: 111786
   Summary: No tail recursion for simple program
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lukas.gra...@tu-darmstadt.de
  Target Milestone: ---

Created attachment 56096
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56096&action=edit
C code of expr_main

Follow up with nearly the same source file as 111643, only without the flatten
attribute. Sorry for taking so long for that. I learned the optimized compiler
should output a tail recursion. But this seams not to be the case: With "sub"
and "call", 16 bytes on the stack are used.

The file attached file contains:

---
int expr_main(int argc, char **argv)
{
return expr_main_original(argc, argv);
}
---

And after cc1 -O3 on amd64, the output contains:

-- gcc 13.2.0 --
expr_main:
subq$8, %rsp
callexpr_main_original
---

-- gcc 9.4.0 shipped with ubuntu 20.04 ---
expr_main:
endbr64
pushq   %rax
popq%rax
pushq   %rax
callexpr_main_original
---

-- Expected --
expr_main:
jmp expr_main_original
---

If I compile the above snippet only, I get the expected result. But not when
compiling the whole C file which also includes the body of
expr_main_original(). I also suspect there are some other factors I don't know,
since many other functions I tested yield the expected result.

In my case, the overhead seams to be negligible. However, I think it should be
possible to construct similar recursive programs where the overhead compared to
tail recursion is not negligible.

[Bug c/111786] No tail recursion for simple program

2023-10-12 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111786

--- Comment #3 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #1)
> We completely intentionally don't emit tail calls to noreturn functions, so
> that e.g. in case of abort one doesn't need to virtually reconstruct
> backtrace.
> In your case, the interprocedural optimizations determine expr_main_original
> is noreturn and so calls it normally (and optimizes away anything after that
> call).

Thank you very much indeed! (Ah yes, this also explains why there is not
"ret".) And sorry for not realizing that this is duplicate. So the "call" is
intentionally emitted by gcc for a better debugging experience. I agree, this
makes perfectly sense in many cases.

However, the price of better debugging seems to be the danger of a stack
overflow. After I understood your "complete" intention, it took me about 20 min
to construct an example with bears a stack overflow following that intention.

---
void foo(int n) {
if (n == 0)
exit(0);
int x[200];
for (int i = 0; i < 200; i++)
extern_function(x[i], x[200-i]);
return foo(n-1);
}
---

After adding __attribute__((noreturn)), compiling with -O3 and passing 1 to
foo(), I get a segmentation fault. There is still a warning "function declared
‘noreturn’ has a ‘return’ statement". But in our case, the noreturn attribute
is not wrong, because none of the recursive calls actually do return. This
might be something that interprocedure optimizations detect in the future. So
even without attribute noreturn, gcc could decide to produce no tail recursion
(because it is a noreturn function, regardless of the noreturn attribute).

Last remark, then I remain silent. I just learned that clang actually has the
attribute musttail which would help for my reported C file as well as in the
foo() example above to prevent the stack overflow. But I guess it is not
planned to add musttail to gcc?

[Bug c/111896] New: call with wrong stack alignment

2023-10-20 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111896

Bug ID: 111896
   Summary: call with wrong stack alignment
   Product: gcc
   Version: 9.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lukas.gra...@tu-darmstadt.de
  Target Milestone: ---

Created attachment 56157
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56157&action=edit
ccnhCTdD.i.tmp.i

For some reason, I manged to get a SEGV when running a program. I spent time
debugging it, and found out that the problem was when executing:

movaps %xmm0,0x40(%rsp)

It took me some time, but I realized the SEGV was caused by the rsp pointer 8
bytes off. It should be aligned to 16 bytes. So wrong alignment. I also found
out where the misalignment happend.

See the attached file. dlist_free_original() is calling freeit(). This is
compiled as dlist_free_original.constprop.0 calling do_line() as follows:

dlist_free_original.constprop.0:
...
pushq   %rbp
...
pushq   %rbx
...
calldo_line

So the stack is misaligned when the call happens. It might be because do_line()
is written in inline asm with __attribute__((naked)).

Starting with gcc 11.3, there seems to be an extra "sub rsp,8" which seems to
solve this. But I was using gcc 9.4.0 (shipped with ubuntu 20.04) on amd64
linux. A quick check on godbolt showed me that misalignment still happen in gcc
11.2. So I am unsure if this is still relevant but I am reporting just in case.

gcc -O3 -c -S ccnhCTdD.i.tmp.i -o tmp.s

If you need the full executable or anything else, ask me.

Background:

I wanted to have a way to record which functions where called through a
pointer. For that, I created a wrapper for every function, renaming the
original function to ..._original. I also created a macro renaming direct calls
to _original so that only calls through a pointer were left. The wrapper
functions are doing their logging (it takes only a few instructions) and then
sibcall to the respective original function. A wrapper for vararg functions
seemed to be only possible using asm, so I used asm. Since the other functions
might be static, I had to do inline asm with attribute naked.

[Bug target/111896] call with wrong stack alignment

2023-10-20 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111896

--- Comment #2 from Lukas Grätz  ---
(In reply to Andrew Pinski from comment #1)
> No I think you are looking into the wrong location.
> 
> When a call happens, it pushes a value on the stack aligning the stack that
> is incoming into that function.
> 
> In the case of GCC 11.3 and above, there is inlining happening.

Well, I could be mistaken. But I couldn't see the inlining.

In GCC 11.3 and above I get something like:
==
dlist_free_original.constprop.0:
pushrbp
pushrbx
...
sub rsp, 8
...
calldo_line
==

In GCC 11.2 and below it is something like:
=
dlist_free_original.constprop.0:
pushrbp
...
pushrbx
...
calldo_line
===

And I checked with the gdb debugger that the rsp is indeed misaligned at the
start of do_line(). The alignment was OK at the start of
"dlist_free_original.constprop.0".

==
$ gdb busybox_unstripped
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
...
(gdb) break dlist_free_original.constprop.0
Breakpoint 1 at 0x59a7ac
(gdb) break do_line
Breakpoint 2 at 0x59a474
(gdb) run patch -R -i input.patch

Breakpoint 1, 0x0059a7ac in dlist_free_original.constprop ()
(gdb) i r rsp
rsp0x7fffd998  0x7fffd998
(gdb) c
Continuing.

Breakpoint 2, 0x0059a474 in do_line ()
(gdb) i r rsp
rsp0x7fffd980  0x7fffd980
==

[Bug target/111896] call with wrong stack alignment

2023-10-20 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111896

--- Comment #5 from Lukas Grätz  ---
Thanks a lot!

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-01-14 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

Lukas Grätz  changed:

   What|Removed |Added

 CC||lukas.graetz@tu-darmstadt.d
   ||e

--- Comment #6 from Lukas Grätz  ---
(In reply to Sam James from comment #5)
> (In reply to thutt from comment #2)
> PR10837 has some discussion on this point too.

The debugging argument there was for the backtrace. For that we only need to
follow the calling conventions to save the stack and instruction pointers.
Other registers, including callee saved registers like r12,r13,r14,r15 are not
used in a backtrace.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-01-14 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #7 from Lukas Grätz  ---
(In reply to H.J. Lu from comment #4)
> When I compiled __cxxabiv1::__cxa_throw, which is a noreturn function in
> libstdc++-v3/libsupc++/eh_throw.cc not to save callee-saved registers,
> most of C++ exception tests crashed.

Can you tell how you compiled it? Thanks in advance!

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-01-14 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #9 from Lukas Grätz  ---
(In reply to H.J. Lu from comment #8)
> (In reply to Lukas Grätz from comment #7)
> > (In reply to H.J. Lu from comment #4)
> > > When I compiled __cxxabiv1::__cxa_throw, which is a noreturn function in
> > > libstdc++-v3/libsupc++/eh_throw.cc not to save callee-saved registers,
> > > most of C++ exception tests crashed.
> > 
> > Can you tell how you compiled it? Thanks in advance!
> 
> I have a patch to fix it. Please try users/hjl/pr113312/gcc-13 branch:
> 
> 
> For your testcase, I got

Well it is not my testcase. But I added backtracing and observed that the
printed backtrace is unchanged with your patch. The new no_return_to_caller():

void __attribute__((noreturn))
no_return_to_caller(int a, int b, int c, int d)
{
   LOOP_BODY;

#define BT_BUF_SIZE 100
   void *buffer[BT_BUF_SIZE];
   backtrace_symbols_fd(buffer, backtrace(buffer, BT_BUF_SIZE), STDOUT_FILENO);

   while (1);
}

What I observed from the assembly is that %rbp is not saved, whereas %rip and
%rsp are still implicitly saved by the call instruction. But since glibc's
backtrace implementation does not use %rbp, this is fine.

Some amateur speculation, just ignore it: I don't know whether %rbp is the
source of the failed C++ test cases, which also do some stack unwinding. After
looking in the System V Abi specification I am still unsure whether stack
unwinding relies on %rbp or not. Perhaps there is an unnecessary dependency on
%rbp or a missing "-fno-omit-frame-pointer" somewhere in the gcc internals that
causes the problem.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-01-14 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #11 from Lukas Grätz  ---
(In reply to H.J. Lu from comment #10)
> The C++ test issue is caused by missing callee-saved registers for
> exception supports in noreturn functions in libstdc++.  I fixed it by
> keeping callee-saved registers when exception is enabled.
> 
> Backtrace with %rbp is unrelated to this.  Gcc will skip %rpb without
> -fno-omit-frame-pointer.

Great! Then I guess there is no pitfall in your patch.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-01-15 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #12 from Lukas Grätz  ---
(In reply to H.J. Lu from comment #10)
> The C++ test issue is caused by missing callee-saved registers for
> exception supports in noreturn functions in libstdc++.  I fixed it by
> keeping callee-saved registers when exception is enabled.


I guess that exception throwing needs callee-saved registers, because it uses
stack unwinding to do something very similar to a return.


void f1(void) {
  CODE, compiler translation uses callee-saved %r12
  f2();
  CODE, compiler translation uses callee-saved %r12
}

void f2(void) {
  f3();
}

void f3(void) {
  CODE, compiler translation uses callee-saved %r12
  f4();
  while(1);
}

void f4(void) {
  CODE, uses loop unwinding functions
a) restores all callee-saved registers in f3(), f2()
b) restores %rsp and %rip from stack of f2()
  unreachable();
}

In effect, b) is a return from the call f2() in f1(), although it happens in
f4().

%r12 needs only to be saved in f1() and f3(). Gcc with -O2 would do that.
However, with your patch, %r12 would not be saved in f3() anymore. This can
lead to crashing in the second CODE block in f1().

The solution should require __attribute__((nothrow)) in addition to noreturn in
your optimization patch. The b) in f4() should/would be treated as a throw. So
none of f1(), f2(), f3() should have the attribute nothrow.

So in the example of this report, the signature of value() should be modified
to:

extern __attribute__((nothrow)) unsigned value(int i, int j, int k);

Only then it is safe to skip saving callee-saved registers. "nothrow" should
also be added to bar() and fn() in your test case pr38534-2.c.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-01-15 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #13 from Lukas Grätz  ---
(In reply to Lukas Grätz from comment #12)

>   CODE, uses loop unwinding functions
>a) restores all callee-saved registers in f3(), f2()
>b) restores %rsp and %rip from stack of f2()

I meant stack unwinding.  f3() and f2() can be in separate compilation units,
it needs is ".cfi_offset REGISTER, OFFSET" from the elf (also in the generated
assembly).

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-01-15 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #14 from Lukas Grätz  ---
Never mind my above comments. I just realized that attribute nothrow has no
effect in C, unless -fexceptions. So nothrow is not needed (only
-fno-exceptions). Furthermore, most noreturn functions throw in C++, so there
would be little potential optimization when exceptions are enabled.

What puzzles me, is that functions like exit() have different signatures in C
and C++. With "gcc -E -fexceptions somefile.cc" I get

  extern void exit (int __status)
throw () __attribute__ ((__noreturn__));

in C++ and in C I get with "gcc -E -fexceptions somefile.c"

  extern void exit (int __status)
__attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__noreturn__));

, although exceptions are explicitly enabled in both cases. But I guess this is
a problem in Glibc, not GCC.

I will really shut up now, promise!

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-01 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #25 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #19)
> (In reply to H.J. Lu from comment #18)
> > (In reply to Jakub Jelinek from comment #17)
> > > E.g. shouldn't it at least be disabled for -O0 and -Og and shouldn't we
> > 
> > We can disable this for -O0 and -Og.
> 
> I think we should go for that.
> 

This is independent from debugging, but I thought the patch was only meant for
-O3. Have you thought about the following situation:


Compile with gcc -O1 (and --fno-exception is implicit):
 library.c 
void __attribute__((noreturn))
foo(void (*bar)(void)) {
...
bar();
while (1);
}
-


Compile with g++ (and -fexception is implicit):
 app.c++ 
#include 

extern void foo(void (*bar)(void));
extern void bar_throws_exception(void) throw ();

int main() {
...
try
{
foo(bar_throws_exception);
}
catch (const std::exception& e)
{
...
}
}
--


It is not hart to fill the ... to make it use some callee-saved registers (e.g.
with LOOP_BODY as in this issue report). And then the program would crash.

One might argue that either the library.c is to blame for the missing
-fexceptions? Or that the app.c++ is to blame because it should not call foo
with an argument function that might throw an exception? I am unsure if the C++
standard actually forbids calling C library functions with argument functions
that might throw an exception.

So I think it would be safer to restrict the patch to -O3. But I really don't
know much about this.

[Bug rtl-optimization/38534] gcc 4.2.1 and above: No need to save called-saved registers in 'noreturn' function

2024-02-01 Thread lukas.graetz--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38534

--- Comment #27 from Lukas Grätz  ---
(In reply to Jakub Jelinek from comment #26)
> (In reply to Lukas Grätz from comment #25)
> > (In reply to Jakub Jelinek from comment #19)
> > > (In reply to H.J. Lu from comment #18)
> > > > (In reply to Jakub Jelinek from comment #17)
> > > > > E.g. shouldn't it at least be disabled for -O0 and -Og and shouldn't 
> > > > > we
> > > > 
> > > > We can disable this for -O0 and -Og.
> > > 
> > > I think we should go for that.
> > > 
> > 
> > This is independent from debugging, but I thought the patch was only meant
> > for -O3. Have you thought about the following situation:
> 
> Throwing an exception through -fno-exceptions code is UB, don't do that.

Thanks for the info!

I guess this UB is implicit in GCC's documentation for "-fexceptions":

"However, you may need to enable this option when compiling C code that needs
to interoperate properly with exception handlers written in C++."

I would suggest to remove the "may" and to be more clear:

"However, you need to enable this option when compiling C code that could
(implicitly) propagate an exception (from C++) to an exception handler written
in C++."

39 matches

Mail list logo