[Bug c/116642] New: miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 Bug ID: 116642 Summary: miscompilation involving vfork child. Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: joshudson at gmail dot com Target Milestone: --- Created attachment 59073 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59073&action=edit main reproduction source file GCC Version: gcc (Debian 12.2.0-14) 12.2.0 Build Command: gcc -o nutty -Wall -Wl,-T,minipie.ld -Wl,--no-dynamic-linker -pie -nostdlib -nostartfiles -O3 -s -fpic -fno-asynchronous-unwind-tables -ffreestanding package.s nutty.i The resulting binary doesn't work: the first first array member passed to execve() is out of bounds. This is easily observed with strace -f: execve("./nutty", ["./nutty"], 0x7fffc4882e28 /* 41 vars */) = 0 open("uid.db", O_RDWR|O_CREAT, 0600)= 3 flock(3, LOCK_EX) = 0 read(3, "", 4096) = 0 vfork(strace: Process 19097 attached [pid 19097] execve("/bin/false", [0x82d, "-md", "/u/U/home/U", "-k", "/u/U/etc/skel", "-s", "", "-u", "6", "-g", "urun", "U"], NULL) = -1 EFAULT (Bad address) [pid 19097] exit(0) = ? [pid 19096] <... vfork resumed>)= 19097 [pid 19097] +++ exited with 0 +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=19097, si_uid=1000, si_status=0, si_utime=0, si_stime=0} --- wait4(19097, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 19097 exit(14)= ? +++ exited with 14 +++ (0x82d is bogus...) I have had enormous difficulty reducing this; it behaves very much like there needs to be a certain weight to the code in the function or the miscompilation disappears. It's almost like it has something with the xmm variables but I haven't been able to locate the definite problem in the assembly file. This code has already been reduced far beyond sensibility and won't make sense to you--it's no longer doing something meaningful. Either of the following small changes result in the code working: 1) move the declaration of argv[] outside the if vfork() statement 2) put a writeblock function call inside the vfork() if block before execve. I only tried it after the argv[] declaration; never before. For some reason the compiler thinks _exit() returns; this is probably a good thing otherwise there's a chance it would overlap argv[] with something vital. There's been some discussion online (unfortunately marred by a vfork opponent) as to whether or not this is the signature of undefined behavior; however no undefined behavior has been found and we can't come up with a remotely plausible hypothesis how undefined behavior can cause this particular malfunction; all workable hypotheses would fail after a successful execve not upon calling it.
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #1 from Joshua --- Created attachment 59074 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59074&action=edit runtime required to build nutty.i all the way to an executable
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #2 from Joshua --- Created attachment 59075 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59075&action=edit linker script used to build nutty.i (don't know if requried or not to build executable)
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #8 from Joshua --- I am absolutely certain declaring new local variables inside the vfork() block is kosher. Modifying variables outside the vfork() block simply requires marking them volatile; in this the rule is the same as setjmp().
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #9 from Joshua --- Update: vfork is not the problem. If the function is _called_ vfork the bug happens. The bug remains if I update package.s to use __NR_fork instead of __NR_vfork.
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #11 from Joshua --- So far I have the following. I have modified the asm stub so that vfork() is an alias for fork(); of course gcc doesn't know about that, and that's the point. If I remove -ffreestanding I can verify the asm is correct but the code doesn't work for other reasons I'm not in a position to fix. If I compile with -mno-sse -mno-sse2 I can verify the asm is correct and this time the binary works. The generated asm with sse looks dodgy but I haven't been able to find a definite fault yet. -ffreestanding does two things. 1) It removes some basic assumptions about what functions do; and 2) it prevents gcc from calling library functions implicitly. I only need the second behavior to suppress internally generated calls to memset. I *think* the bug has something to do with assumptions about what vfork does that just aren't true because it's missing part of the internal name->function sense when -ffreestanding is used. Tell you what: I can try placing volatile on argv[] immediately; if that fixes the problem I'm willing to shut up and say that's perfectly reasonable; on the other hand if it doesn't fix the problem it's time for somebody else to look at it.
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #12 from Joshua --- I located the actual bug in the disassembly. While it is somehow caused by vfork; it is not due to vfork or vfork rules. The bad assembly code is here. .LC16: .quad .LC7 You can't *do* that in this environment. No relocations allowed.
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #14 from Joshua --- Is this just a doc problem then? It says I must use -fpie to make a relocatable executable. I need a relocatable executable with no relocations. There is nothing in source code requiring a relocation, so I am surprised at gcc trying to emit one. It's too bad I don't get a compile error if I try to write something like this: static char *str = "/bin/false";
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #16 from Joshua --- I took out -fpie and the output assembly was different and the binary started working. That is contrary to the documentation which says you need -fpie for position independent executables. I suppose it would be according to the documentation if I compiled with -fpic -fno-pie and linked with -fpie but it doesn't make sense to me why the difference *matters*. Still feels very weird; like it's a different version of arbitrary changes that randomly caused it to not generate the relocation; like so many hundreds of changes trying to produce the minimal case that generated it. If I *knew* it was sse that's generating relocations behind my back I'd just turn it off and be done with it, but I don't. I only know it's one of many CPU features.
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #18 from Joshua --- I stand corrected. I removed -pie. The resulting output binary is still relocatable in memory. I don't think the Kernel is willing to load an ELF binary at address 0, and that's the only other option with this header. (I ran hexdump on the binary to verify) Hmmm that's really weird. The way you describe it that shouldn't change the compiler pass.
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #21 from Joshua --- Well then; I guess I'm having pain every time I upgrade the compiler no matter what.
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #25 from Joshua --- Guys, I goofed and don't know what I actually did. I failed to reset one of the other hypotheses after finding the problem in the disassembly. On re-unpacking the archive containing the reproduction I uploaded, taking out -pie doesn't fix the problem anymore.
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #26 from Joshua --- I've been wondering how to fix this item. Having found the faulting assembly code; vfork is incidental to the problem. Removing vfork simply preterbs it away not actually fixing. I actually hit this once before with execve but without vfork; failed to find the actual problem, and use memcpy to build up the array in a buffer rather than the inline initializer to get rid of it. What this code *is* is the root component of a 20,000 line web management interface. It's still growing; the final target looks like it's going to be 50,000 lines with 1,200 lines of root code. Everything it doesn't need has been stripped away. libc is gone, replaced with three hundred lines of assembly mostly declaring system calls. The dynamic linker is gone; one less piece of code to audit. So the root failing construct is: .LC16: .quad .LC7 You can't *do* that in this environment. No relocations allowed. The sse code that is using it is a pessimization not an optimization anyway. To avoid a register stall it wants a relocation and a memory access. That relocation costs more than the register stall. So ultimately the best way to fix it is to stop generating that construct and similar constructs; as though this were the the ELF relocation processor itself. If -mno-sse is what it takes, that's fine. But I need to know exactly how it's intended to be done. Adding more compiler options costs me nothing. Adding more compiler options blindly hoping I've got the right ones costs too much. So far I haven't gotten a new upstream gcc; I could if it fixed the problem. It's just a build management problem that I'm not going to tackle unless I need to.
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #32 from Joshua --- >> Build your program as a static PIE and use assembly >> (or a very limited C subset) to relocate itself on startup. > All other implementations (Glibc, Musl, and Linux kernel > with KASLR enabled on modern architectures like RISC-V) > do this, instead of urging the compiler to add some > feature to disable all relocs. Yeah, about that. Both glibc and musl do it with a restricted subset of C code. I actually looked at the build step of the relocation engine file (elf/dl-reloc-static-pie.c in glibc) for options to pass to the compiler. It looks to me very much like we're pushing up against a situation where further machine optimizations could start introducing a relocation in the middle of the relocation code. Say the top of elf/dl-reloc-static-pie.c might compile to to something like this and I see nothing stopping it: 29 struct link_map *main_map = _dl_get_dl_main_map (); leaq dl_main_map, %rdi 37 main_map->l_addr = elf_machine_load_address (); movabs .LC1, %xmm0 ;; .LC1: .quad load_address 40 main_map->l_ld = ((void *) main_map->l_addr + elf_machine_dynamic ()); leaq load_address, %rsi leaq dynamic_address(%rsi), %rsi movabs %rsi, %xmm1 punpcklqdq movabs %xmm0, (%rdi) And if you think this assembly is nonsensical; it's the same assembly that's causing the fault, for what looks like the same reasons. And that's reason enough.
[Bug c/116642] miscompilation involving vfork child.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642 --- Comment #33 from Joshua --- Florin wrote: > It's quite brittle and requires constant maintenance as the compiler > changes. We've simplified the code significantly of the past few years > and may finally be able to properly fix it, as far such a thing is > possible without rewriting self-relocation processing in assembler. > > Thanks, > Florian Which was the piece of knowledge I was missing.
[Bug c/118678] New: Dubious optimization when compiling with -fpie -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118678 Bug ID: 118678 Summary: Dubious optimization when compiling with -fpie -Os Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: joshudson at gmail dot com Target Milestone: --- The following code fragment demonstrates the problem: extern int execve2(const char *name, const char**args, const char **environ); int function() { const char *args[] = { "/bin/echo", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", 0 }; return -execve2(args[0], args, 0); } When compiling with -S -Os it finds and optimization with the weight of the code being eight bytes per array member: leaq.LC11(%rip), %rsi movl$22, %ecx xorl%edx, %edx leaq8(%rsp), %rdi rep movsl It could have been better by saying leaq.LC11(%rip), %rsi movl$11, %ecx xorl%edx, %edx leaq8(%rsp), %rdi rep movsq However the bug that surprised me is it continues to generate this movs instruction when compiling with -S -Os -fpie ; the optimization is no longer nice because this now requires *eleven* relocations with the weight of the code being 16 bytes per array member; the simple solution generated by -S -O3 -mno-sse only uses 12 bytes per array member. I don't really know how the guts of the optimizer work; but I haven't been able to get it to generate this anywhere other than an array of constants, and it often misses it; so I'm not too sure how hard this is to fix.