[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303 --- Comment #5 from Thomas Garnier --- I didn't try the patch yet, that could be a good starting point (still need change in switch optimization and segment registers). What is the consequence of the change in default_binds_local_p_3? Is it supposed to remove the need for a GOT / PLT?
[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303 --- Comment #7 from Thomas Garnier --- Created attachment 43189 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43189&action=edit testcase for mcmodel=large Build with: gcc -mcmodel=large -c -fstatic-pie ./test.c -o test Dump relocations on the object file: objdump -dr ./test
[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303 Thomas Garnier changed: What|Removed |Added Attachment #43189|0 |1 is obsolete|| --- Comment #8 from Thomas Garnier --- Created attachment 43190 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43190&action=edit testcase for mcmodel=large
[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303 --- Comment #9 from Thomas Garnier --- I tested the change against a modified version of the proposed Linux x86_64 PIE support. The changes removes all the PLT32 and GOT64 entry but I still get R_X86_64_GOTPC64 & R_X86_64_GOTOFF64 relocations on the head64.c file that is built with -mcmodel=large (to prevent odd logic on early boot with different VA). Do you think the suggested patch can be changed to remove these? To repro, build the object file with: gcc -mcmodel=large -c -fstatic-pie ./test.c -o test The objdump -dr output of the testcase: : 0: 55 push %rbp 1: 48 89 e5mov%rsp,%rbp 4: 48 83 ec 20 sub$0x20,%rsp 8: 48 8d 05 f9 ff ff fflea-0x7(%rip),%rax# 8 f: 49 bb 00 00 00 00 00movabs $0x0,%r11 16: 00 00 00 11: R_X86_64_GOTPC64_GLOBAL_OFFSET_TABLE_+0x9 19: 4c 01 d8add%r11,%rax 1c: 89 7d ecmov%edi,-0x14(%rbp) 1f: 48 89 75 e0 mov%rsi,-0x20(%rbp) 23: 48 ba 00 00 00 00 00movabs $0x0,%rdx 2a: 00 00 00 25: R_X86_64_GOTOFF64 _text-0x1023 2d: 48 8d 14 10 lea(%rax,%rdx,1),%rdx 31: 89 55 fcmov%edx,-0x4(%rbp) 34: 8b 55 ecmov-0x14(%rbp),%edx 37: 48 63 d2movslq %edx,%rdx 3a: 48 8b 4d e0 mov-0x20(%rbp),%rcx 3e: 48 89 cemov%rcx,%rsi 41: 48 b9 00 00 00 00 00movabs $0x0,%rcx 48: 00 00 00 43: R_X86_64_GOTOFF64 _text 4b: 48 8d 3c 08 lea(%rax,%rcx,1),%rdi 4f: 48 b9 00 00 00 00 00movabs $0x0,%rcx 56: 00 00 00 51: R_X86_64_GOTOFF64 memcpy 59: 48 8d 04 08 lea(%rax,%rcx,1),%rax 5d: ff d0 callq *%rax 5f: 8b 45 fcmov-0x4(%rbp),%eax 62: c9 leaveq 63: c3 retq
[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303 --- Comment #11 from Thomas Garnier --- I think for this file using only -mcmodel=large makes more sense. Given the proposed option (-fstatic-pie) is not kernel specific, the TLS is not needed. What do you think about disabling optimization like switch folding [1]? It seems to exist only to remove relocations that is not even needed in a classic -fPIE (or -fstatic-pie) scenario. [1] https://github.com/gcc-mirror/gcc/blob/7977b0509f07e42fbe0f06efcdead2b7e4a5135f/gcc/tree-switch-conversion.c#L828
[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303 --- Comment #13 from Thomas Garnier --- Created attachment 43223 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43223&action=edit testcase for switch folding No switch folding if built with: $CC -O2 -fno-PIE -c -o switch ./switch.c Switch folding if built with: $CC -O2 -fPIE -c -o switch ./switch.c or $CC -O2 -fstatic-pie -c -o switch ./switch.c
[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303 --- Comment #14 from Thomas Garnier --- Correcting what I said before, it is about re-enabling switch folding (or switch optimization). Basically without PIE (-fno-PIE) with -O2, a switch can be optimized to be: : 0: b8 00 00 00 00 mov$0x0,%eax 1: R_X86_64_32 .rodata.str1.1 5: 83 ff 16cmp$0x16,%edi 8: 77 0a ja 14 a: 89 ff mov%edi,%edi c: 48 8b 04 fd 00 00 00mov0x0(,%rdi,8),%rax 13: 00 10: R_X86_64_32S.rodata 14: c3 retq With PIE and -O2 it becomes: : 0: 83 ff 16cmp$0x16,%edi 3: 0f 87 87 01 00 00 ja 190 9: 48 8d 15 00 00 00 00lea0x0(%rip),%rdx# 10 c: R_X86_64_PC32.rodata-0x4 10: 89 ff mov%edi,%edi 12: 48 63 04 ba movslq (%rdx,%rdi,4),%rax 16: 48 01 d0add%rdx,%rax 19: ff e0 jmpq *%rax 1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 20: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 27 23: R_X86_64_PC32 .LC1-0x4 27: c3 retq 28: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 2f: 00 30: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 37 33: R_X86_64_PC32 .LC22-0x4 37: c3 retq 38: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 3f: 00 40: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 47 43: R_X86_64_PC32 .LC21-0x4 47: c3 retq 48: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 4f: 00 50: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 57 53: R_X86_64_PC32 .LC20-0x4 57: c3 retq 58: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 5f: 00 60: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 67 63: R_X86_64_PC32 .LC19-0x4 67: c3 retq 68: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 6f: 00 70: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 77 73: R_X86_64_PC32 .LC18-0x4 77: c3 retq 78: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 7f: 00 80: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 87 83: R_X86_64_PC32 .LC17-0x4 87: c3 retq 88: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 8f: 00 90: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 97 93: R_X86_64_PC32 .LC16-0x4 97: c3 retq 98: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 9f: 00 a0: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# a7 a3: R_X86_64_PC32 .LC15-0x4 a7: c3 retq a8: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) af: 00 b0: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# b7 b3: R_X86_64_PC32 .LC14-0x4 b7: c3 retq b8: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) bf: 00 c0: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# c7 c3: R_X86_64_PC32 .LC13-0x4 c7: c3 retq c8: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) cf: 00 d0: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# d7 d3: R_X86_64_PC32 .LC12-0x4 d7: c3 retq d8: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) df: 00 e0: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# e7 e3: R_X86_64_PC32 .LC11-0x4 e7: c3 retq e8: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) ef: 00 f0: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# f7 f3: R_X86_64_PC32 .LC10-0x4 f7: c3 retq f8: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) ff: 00 100: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 107 103: R_X86_64_PC32 .LC9-0x4 107: c3 retq 108: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 10f: 00 110: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 117 113: R_X86_64_PC32 .LC8-0x4 117: c3 retq 118: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 11f: 00 120: 48 8d 05 00 00 00 00lea0x0(%rip
[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303 --- Comment #16 from Thomas Garnier --- Yes, I think you can't just default to the non-PIE mode. Clang does it well though: : 0: 83 ff 16cmp$0x16,%edi 3: 77 0f ja 14 5: 48 63 c7movslq %edi,%rax 8: 48 8d 0d 00 00 00 00lea0x0(%rip),%rcx# f b: R_X86_64_PC32.data.rel.ro-0x4 f: 48 8b 04 c1 mov(%rcx,%rax,8),%rax 13: c3 retq 14: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 1b 17: R_X86_64_PC32 .L.str.23-0x4 1b: c3 retq
[Bug target/84011] New: Optimize switch table with -fPIE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 Bug ID: 84011 Summary: Optimize switch table with -fPIE Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: thgarnie at google dot com Target Milestone: --- Created attachment 43225 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43225&action=edit Switch test case With -O2 and -fPIE switch tables are never optimized as well as they could. I assume that's to reduce the number of relocations in code sections for COW page support. It makes sense for -fPIC but not -fPIE. The Linux kernel x86_64 PIE prototype ends with big switch tables when it could be optimized to few lines. With -fno-PIE -O2: : 0: b8 00 00 00 00 mov$0x0,%eax 1: R_X86_64_32 .rodata.str1.1 5: 83 ff 16cmp$0x16,%edi 8: 77 0a ja 14 a: 89 ff mov%edi,%edi c: 48 8b 04 fd 00 00 00mov0x0(,%rdi,8),%rax 13: 00 10: R_X86_64_32S.rodata 14: c3 retq With -fPIE -O2: : 0: 83 ff 16cmp$0x16,%edi 3: 0f 87 87 01 00 00 ja 190 9: 48 8d 15 00 00 00 00lea0x0(%rip),%rdx# 10 c: R_X86_64_PC32.rodata-0x4 10: 89 ff mov%edi,%edi 12: 48 63 04 ba movslq (%rdx,%rdi,4),%rax 16: 48 01 d0add%rdx,%rax 19: ff e0 jmpq *%rax 1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 20: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 27 23: R_X86_64_PC32 .LC1-0x4 27: c3 retq 28: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 2f: 00 30: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 37 33: R_X86_64_PC32 .LC22-0x4 37: c3 retq 38: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 3f: 00 40: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 47 43: R_X86_64_PC32 .LC21-0x4 47: c3 retq 48: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 4f: 00 50: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 57 53: R_X86_64_PC32 .LC20-0x4 57: c3 retq 58: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 5f: 00 60: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 67 63: R_X86_64_PC32 .LC19-0x4 67: c3 retq 68: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 6f: 00 70: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 77 73: R_X86_64_PC32 .LC18-0x4 77: c3 retq 78: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 7f: 00 80: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 87 83: R_X86_64_PC32 .LC17-0x4 87: c3 retq 88: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 8f: 00 90: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# 97 93: R_X86_64_PC32 .LC16-0x4 97: c3 retq 98: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 9f: 00 a0: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# a7 a3: R_X86_64_PC32 .LC15-0x4 a7: c3 retq a8: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) af: 00 b0: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# b7 b3: R_X86_64_PC32 .LC14-0x4 b7: c3 retq b8: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) bf: 00 c0: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# c7 c3: R_X86_64_PC32 .LC13-0x4 c7: c3 retq c8: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) cf: 00 d0: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# d7 d3: R_X86_64_PC32 .LC12-0x4 d7: c3 retq d8: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) df: 00 e0: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# e7 e3: R_X86_64_PC32 .LC11-0x4 e7: c3 retq e8: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) ef: 00 f0: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# f7 f3: R_X86_64_PC32 .LC10-0x4 f7: c3
[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303 --- Comment #18 from Thomas Garnier --- Ok. Opened: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011
[Bug target/82303] New: Better PIE/PIC code generation for kernel code (x86_64 & arm64)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303 Bug ID: 82303 Summary: Better PIE/PIC code generation for kernel code (x86_64 & arm64) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: thgarnie at google dot com Target Milestone: --- The current PIE/PIC code generation is not optimal for kernel code. It makes inferences about the execution environment which do not hold for freestanding executables such as the Linux kernel, regarding the need to avoid text relocations, to minimize the footprint of CoWed pages, and to always refer to exported symbols via the GOT so they can be preempted. None of these concerns apply to freestanding binaries. Having a separate flag (like mcmodel=kernel-pie or -fkernel-pie) would allow better code optimization for PIE/PIC kernel code, notably: - Select the right segment register for TLS on kernel code (For example x86_64 use gs instead of fs [1]). - No need for GOT or PLT. - Re-enable code optimizations disabled for COW pages support, trying to reduce relocations to code sections. For example, switch are not folded for PIE/PIC code to avoid relocations [2]. Note that arm64 PIE uses the small or tiny mcmodel based on UEFI so it should be taken in considerations for this architecture. For reference the discussion on Linux kernel x86_64 PIE RFC: http://www.openwall.com/lists/kernel-hardening/2017/09/21/16 [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 [2] https://github.com/gcc-mirror/gcc/blob/7977b0509f07e42fbe0f06efcdead2b7e4a5135f/gcc/tree-switch-conversion.c#L828