[Bug c/116642] New: miscompilation involving vfork child.

2024-09-07 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

Bug ID: 116642
   Summary: miscompilation involving vfork child.
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joshudson at gmail dot com
  Target Milestone: ---

Created attachment 59073
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59073&action=edit
main reproduction source file

GCC Version: gcc (Debian 12.2.0-14) 12.2.0

Build Command:
gcc -o nutty -Wall -Wl,-T,minipie.ld -Wl,--no-dynamic-linker -pie -nostdlib
-nostartfiles -O3 -s -fpic -fno-asynchronous-unwind-tables -ffreestanding
package.s nutty.i

The resulting binary doesn't work: the first first array member passed to
execve() is out of bounds. This is easily observed with strace -f:

execve("./nutty", ["./nutty"], 0x7fffc4882e28 /* 41 vars */) = 0
open("uid.db", O_RDWR|O_CREAT, 0600)= 3
flock(3, LOCK_EX)   = 0
read(3, "", 4096)   = 0
vfork(strace: Process 19097 attached
 
[pid 19097] execve("/bin/false", [0x82d, "-md", "/u/U/home/U", "-k",
"/u/U/etc/skel", "-s", "", "-u", "6", "-g", "urun", "U"], NULL) =
-1 EFAULT (Bad address)
[pid 19097] exit(0) = ?
[pid 19096] <... vfork resumed>)= 19097
[pid 19097] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=19097, si_uid=1000,
si_status=0, si_utime=0, si_stime=0} ---
wait4(19097, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 19097
exit(14)= ?
+++ exited with 14 +++

(0x82d is bogus...)

I have had enormous difficulty reducing this; it behaves very much like there
needs to be a certain weight to the code in the function or the miscompilation
disappears. It's almost like it has something with the xmm variables but I
haven't been able to locate the definite problem in the assembly file.

This code has already been reduced far beyond sensibility and won't make sense
to you--it's no longer doing something meaningful.

Either of the following small changes result in the code working:
1) move the declaration of argv[] outside the if vfork() statement
2) put a writeblock function call inside the vfork() if block before execve. I
only tried it after the argv[] declaration; never before.

For some reason the compiler thinks _exit() returns; this is probably a good
thing otherwise there's a chance it would overlap argv[] with something vital.

There's been some discussion online (unfortunately marred by a vfork opponent)
as to whether or not this is the signature of undefined behavior; however no
undefined behavior has been found and we can't come up with a remotely
plausible hypothesis how undefined behavior can cause this particular
malfunction; all workable hypotheses would fail after a successful execve not
upon calling it.

[Bug c/116642] miscompilation involving vfork child.

2024-09-07 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #1 from Joshua  ---
Created attachment 59074
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59074&action=edit
runtime required to build nutty.i all the way to an executable

[Bug c/116642] miscompilation involving vfork child.

2024-09-07 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #2 from Joshua  ---
Created attachment 59075
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59075&action=edit
linker script used to build nutty.i (don't know if requried or not to build
executable)

[Bug c/116642] miscompilation involving vfork child.

2024-09-07 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #8 from Joshua  ---
I am absolutely certain declaring new local variables inside the vfork() block
is kosher.

Modifying variables outside the vfork() block simply requires marking them
volatile; in this the rule is the same as setjmp().

[Bug c/116642] miscompilation involving vfork child.

2024-09-07 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #9 from Joshua  ---
Update: vfork is not the problem. If the function is _called_ vfork the bug
happens. The bug remains if I update package.s to use __NR_fork instead of
__NR_vfork.

[Bug c/116642] miscompilation involving vfork child.

2024-09-07 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #11 from Joshua  ---
So far I have the following.

I have modified the asm stub so that vfork() is an alias for fork(); of course
gcc doesn't know about that, and that's the point.

If I remove -ffreestanding I can verify the asm is correct but the code doesn't
work for other reasons I'm not in a position to fix.

If I compile with -mno-sse -mno-sse2 I can verify the asm is correct and this
time the binary works.

The generated asm with sse looks dodgy but I haven't been able to find a
definite fault yet.

 -ffreestanding does two things. 1) It removes some basic assumptions about
what functions do; and 2) it prevents gcc from calling library functions
implicitly. I only need the second behavior to suppress internally generated
calls to memset.

I *think* the bug has something to do with assumptions about what vfork does
that just aren't true because it's missing part of the internal name->function
sense when -ffreestanding is used.

Tell you what: I can try placing volatile on argv[] immediately; if that fixes
the problem I'm willing to shut up and say that's perfectly reasonable; on the
other hand if it doesn't fix the problem it's time for somebody else to look at
it.

[Bug c/116642] miscompilation involving vfork child.

2024-09-07 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #12 from Joshua  ---
I located the actual bug in the disassembly. While it is somehow caused by
vfork; it is not due to vfork or vfork rules. The bad assembly code is here.

.LC16:
.quad   .LC7

You can't *do* that in this environment. No relocations allowed.

[Bug c/116642] miscompilation involving vfork child.

2024-09-07 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #14 from Joshua  ---
Is this just a doc problem then?

It says I must use -fpie to make a relocatable executable.

I need a relocatable executable with no relocations. There is nothing in source
code requiring a relocation, so I am surprised at gcc trying to emit one.

It's too bad I don't get a compile error if I try to write something like this:

static char *str = "/bin/false";

[Bug c/116642] miscompilation involving vfork child.

2024-09-07 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #16 from Joshua  ---
I took out -fpie and the output assembly was different and the binary started
working. That is contrary to the documentation which says you need -fpie for
position independent executables.

I suppose it would be according to the documentation if I compiled with -fpic
-fno-pie and linked with -fpie but it doesn't make sense to me why the
difference *matters*.

Still feels very weird; like it's a different version of arbitrary changes that
randomly caused it to not generate the relocation; like so many hundreds of
changes trying to produce the minimal case that generated it.

If I *knew* it was sse that's generating relocations behind my back I'd just
turn it off and be done with it, but I don't. I only know it's one of many CPU
features.

[Bug c/116642] miscompilation involving vfork child.

2024-09-07 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #18 from Joshua  ---
I stand corrected. I removed -pie.

The resulting output binary is still relocatable in memory. I don't think the
Kernel is willing to load an ELF binary at address 0, and that's the only other
option with this header. (I ran hexdump on the binary to verify)

Hmmm that's really weird. The way you describe it that shouldn't change the
compiler pass.

[Bug c/116642] miscompilation involving vfork child.

2024-09-07 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #21 from Joshua  ---
Well then; I guess I'm having pain every time I upgrade the compiler no matter
what.

[Bug c/116642] miscompilation involving vfork child.

2024-09-08 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #25 from Joshua  ---
Guys, I goofed and don't know what I actually did.

I failed to reset one of the other hypotheses after finding the problem in the
disassembly. On re-unpacking the archive containing the reproduction I
uploaded, taking out -pie doesn't fix the problem anymore.

[Bug c/116642] miscompilation involving vfork child.

2024-09-08 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #26 from Joshua  ---
I've been wondering how to fix this item. Having found the faulting assembly
code; vfork is incidental to the problem. Removing vfork simply preterbs it
away not actually fixing.

I actually hit this once before with execve but without vfork; failed to find
the actual problem, and use memcpy to build up the array in a buffer rather
than the inline initializer to get rid of it.

What this code *is* is the root component of a 20,000 line web management
interface. It's still growing; the final target looks like it's going to be
50,000 lines with 1,200 lines of root code.

Everything it doesn't need has been stripped away. libc is gone, replaced with
three hundred lines of assembly mostly declaring system calls. The dynamic
linker is gone; one less piece of code to audit.

So the root failing construct is:

.LC16:
.quad   .LC7

You can't *do* that in this environment. No relocations allowed.

The sse code that is using it is a pessimization not an optimization anyway. To
avoid a register stall it wants a relocation and a memory access. That
relocation costs more than the register stall.

So ultimately the best way to fix it is to stop generating that construct and
similar constructs; as though this were the the ELF relocation processor
itself. If -mno-sse is what it takes, that's fine. But I need to know exactly
how it's intended to be done. Adding more compiler options costs me nothing.
Adding more compiler options blindly hoping I've got the right ones costs too
much.

So far I haven't gotten a new upstream gcc; I could if it fixed the problem.
It's just a build management problem that I'm not going to tackle unless I need
to.

[Bug c/116642] miscompilation involving vfork child.

2024-09-09 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #32 from Joshua  ---
>> Build your program as a static PIE and use assembly
>> (or a very limited C subset) to relocate itself on startup.

> All other implementations (Glibc, Musl, and Linux kernel
> with KASLR enabled on modern architectures like RISC-V)
> do this, instead of urging the compiler to add some
> feature to disable all relocs.

Yeah, about that. Both glibc and musl do it with a
restricted subset of C code. I actually looked at the build
step of the relocation engine file
(elf/dl-reloc-static-pie.c in glibc) for options to pass
to the compiler.

It looks to me very much like we're pushing up against a
situation where further machine optimizations could start
introducing a relocation in the middle of the relocation code.

Say the top of elf/dl-reloc-static-pie.c might compile to
to something like this and I see nothing stopping it:

  29   struct link_map *main_map = _dl_get_dl_main_map ();

leaq dl_main_map, %rdi

  37   main_map->l_addr = elf_machine_load_address ();

movabs   .LC1, %xmm0   ;; .LC1: .quad load_address

  40   main_map->l_ld = ((void *) main_map->l_addr + elf_machine_dynamic ());

leaq load_address, %rsi
leaq dynamic_address(%rsi), %rsi
movabs   %rsi, %xmm1
punpcklqdq
movabs   %xmm0, (%rdi)

And if you think this assembly is nonsensical; it's the same assembly that's
causing the fault, for what looks like the same reasons.

And that's reason enough.

[Bug c/116642] miscompilation involving vfork child.

2024-09-09 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116642

--- Comment #33 from Joshua  ---
Florin wrote:

> It's quite brittle and requires constant maintenance as the compiler
> changes.  We've simplified the code significantly of the past few years
> and may finally be able to properly fix it, as far such a thing is
> possible without rewriting self-relocation processing in assembler.
>
> Thanks,
> Florian

Which was the piece of knowledge I was missing.

[Bug c/118678] New: Dubious optimization when compiling with -fpie -Os

2025-01-27 Thread joshudson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118678

Bug ID: 118678
   Summary: Dubious optimization when compiling with -fpie -Os
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joshudson at gmail dot com
  Target Milestone: ---

The following code fragment demonstrates the problem:

extern int execve2(const char *name, const char**args, const char **environ);

int function()
{
const char *args[] = { "/bin/echo", "one", "two", "three", "four",
"five", "six", "seven", "eight", "nine", 0 };
return -execve2(args[0], args, 0);
}


When compiling with -S -Os it finds and optimization with the weight of the
code being eight bytes per array member:

leaq.LC11(%rip), %rsi
movl$22, %ecx
xorl%edx, %edx
leaq8(%rsp), %rdi
rep movsl

It could have been better by saying

leaq.LC11(%rip), %rsi
movl$11, %ecx
xorl%edx, %edx
leaq8(%rsp), %rdi
rep movsq

However the bug that surprised me is it continues to generate this movs
instruction when compiling with -S -Os -fpie ; the optimization is no longer
nice because this now requires *eleven* relocations with the weight of the code
being 16 bytes per array member; the simple solution generated by -S -O3
-mno-sse only uses 12 bytes per array member.

I don't really know how the guts of the optimizer work; but I haven't been able
to get it to generate this anywhere other than an array of constants, and it
often misses it; so I'm not too sure how hard this is to fix.