[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)

2018-01-17 Thread thgarnie at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

--- Comment #5 from Thomas Garnier  ---
I didn't try the patch yet, that could be a good starting point (still need
change in switch optimization and segment registers). What is the consequence
of the change in default_binds_local_p_3? Is it supposed to remove the need for
a GOT / PLT?

[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)

2018-01-19 Thread thgarnie at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

--- Comment #7 from Thomas Garnier  ---
Created attachment 43189
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43189&action=edit
testcase for mcmodel=large

Build with: gcc -mcmodel=large -c -fstatic-pie ./test.c -o test
Dump relocations on the object file: objdump -dr ./test

[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)

2018-01-19 Thread thgarnie at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

Thomas Garnier  changed:

   What|Removed |Added

  Attachment #43189|0   |1
is obsolete||

--- Comment #8 from Thomas Garnier  ---
Created attachment 43190
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43190&action=edit
testcase for mcmodel=large

[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)

2018-01-19 Thread thgarnie at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

--- Comment #9 from Thomas Garnier  ---
I tested the change against a modified version of the proposed Linux x86_64 PIE
support. The changes removes all the PLT32 and GOT64 entry but I still get
R_X86_64_GOTPC64 & R_X86_64_GOTOFF64 relocations on the head64.c file that is
built with -mcmodel=large (to prevent odd logic on early boot with different
VA).

Do you think the suggested patch can be changed to remove these?

To repro, build the object file with: gcc -mcmodel=large -c -fstatic-pie
./test.c -o test

The objdump -dr output of the testcase:

 :
   0:   55  push   %rbp
   1:   48 89 e5mov%rsp,%rbp
   4:   48 83 ec 20 sub$0x20,%rsp
   8:   48 8d 05 f9 ff ff fflea-0x7(%rip),%rax# 8 
   f:   49 bb 00 00 00 00 00movabs $0x0,%r11
  16:   00 00 00
11: R_X86_64_GOTPC64_GLOBAL_OFFSET_TABLE_+0x9
  19:   4c 01 d8add%r11,%rax
  1c:   89 7d ecmov%edi,-0x14(%rbp)
  1f:   48 89 75 e0 mov%rsi,-0x20(%rbp)
  23:   48 ba 00 00 00 00 00movabs $0x0,%rdx
  2a:   00 00 00
25: R_X86_64_GOTOFF64   _text-0x1023
  2d:   48 8d 14 10 lea(%rax,%rdx,1),%rdx
  31:   89 55 fcmov%edx,-0x4(%rbp)
  34:   8b 55 ecmov-0x14(%rbp),%edx
  37:   48 63 d2movslq %edx,%rdx
  3a:   48 8b 4d e0 mov-0x20(%rbp),%rcx
  3e:   48 89 cemov%rcx,%rsi
  41:   48 b9 00 00 00 00 00movabs $0x0,%rcx
  48:   00 00 00
43: R_X86_64_GOTOFF64   _text
  4b:   48 8d 3c 08 lea(%rax,%rcx,1),%rdi
  4f:   48 b9 00 00 00 00 00movabs $0x0,%rcx
  56:   00 00 00
51: R_X86_64_GOTOFF64   memcpy
  59:   48 8d 04 08 lea(%rax,%rcx,1),%rax
  5d:   ff d0   callq  *%rax
  5f:   8b 45 fcmov-0x4(%rbp),%eax
  62:   c9  leaveq
  63:   c3  retq

[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)

2018-01-23 Thread thgarnie at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

--- Comment #11 from Thomas Garnier  ---
I think for this file using only -mcmodel=large makes more sense.

Given the proposed option (-fstatic-pie) is not kernel specific, the TLS is not
needed. What do you think about disabling optimization like switch folding [1]?
It seems to exist only to remove relocations that is not even needed in a
classic -fPIE (or -fstatic-pie) scenario.

[1]
https://github.com/gcc-mirror/gcc/blob/7977b0509f07e42fbe0f06efcdead2b7e4a5135f/gcc/tree-switch-conversion.c#L828

[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)

2018-01-23 Thread thgarnie at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

--- Comment #13 from Thomas Garnier  ---
Created attachment 43223
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43223&action=edit
testcase for switch folding

No switch folding if built with:

$CC -O2 -fno-PIE -c -o switch ./switch.c 

Switch folding if built with:

$CC -O2 -fPIE -c -o switch ./switch.c
or
$CC -O2 -fstatic-pie -c -o switch ./switch.c

[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)

2018-01-23 Thread thgarnie at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

--- Comment #14 from Thomas Garnier  ---
Correcting what I said before, it is about re-enabling switch folding (or
switch optimization).

Basically without PIE (-fno-PIE) with -O2, a switch can be optimized to be:

 :
   0:   b8 00 00 00 00  mov$0x0,%eax
1: R_X86_64_32  .rodata.str1.1
   5:   83 ff 16cmp$0x16,%edi
   8:   77 0a   ja 14 
   a:   89 ff   mov%edi,%edi
   c:   48 8b 04 fd 00 00 00mov0x0(,%rdi,8),%rax
  13:   00 
10: R_X86_64_32S.rodata
  14:   c3  retq   

With PIE and -O2 it becomes:

 :
   0:   83 ff 16cmp$0x16,%edi
   3:   0f 87 87 01 00 00   ja 190 
   9:   48 8d 15 00 00 00 00lea0x0(%rip),%rdx# 10

c: R_X86_64_PC32.rodata-0x4
  10:   89 ff   mov%edi,%edi
  12:   48 63 04 ba movslq (%rdx,%rdi,4),%rax
  16:   48 01 d0add%rdx,%rax
  19:   ff e0   jmpq   *%rax
  1b:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
  20:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 27

23: R_X86_64_PC32   .LC1-0x4
  27:   c3  retq   
  28:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  2f:   00 
  30:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 37

33: R_X86_64_PC32   .LC22-0x4
  37:   c3  retq   
  38:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  3f:   00 
  40:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 47

43: R_X86_64_PC32   .LC21-0x4
  47:   c3  retq   
  48:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  4f:   00 
  50:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 57

53: R_X86_64_PC32   .LC20-0x4
  57:   c3  retq   
  58:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  5f:   00 
  60:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 67

63: R_X86_64_PC32   .LC19-0x4
  67:   c3  retq   
  68:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  6f:   00 
  70:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 77

73: R_X86_64_PC32   .LC18-0x4
  77:   c3  retq   
  78:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  7f:   00 
  80:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 87

83: R_X86_64_PC32   .LC17-0x4
  87:   c3  retq   
  88:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  8f:   00 
  90:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 97

93: R_X86_64_PC32   .LC16-0x4
  97:   c3  retq   
  98:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  9f:   00 
  a0:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# a7

a3: R_X86_64_PC32   .LC15-0x4
  a7:   c3  retq   
  a8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  af:   00 
  b0:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# b7

b3: R_X86_64_PC32   .LC14-0x4
  b7:   c3  retq   
  b8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  bf:   00 
  c0:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# c7

c3: R_X86_64_PC32   .LC13-0x4
  c7:   c3  retq   
  c8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  cf:   00 
  d0:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# d7

d3: R_X86_64_PC32   .LC12-0x4
  d7:   c3  retq   
  d8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  df:   00 
  e0:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# e7

e3: R_X86_64_PC32   .LC11-0x4
  e7:   c3  retq   
  e8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  ef:   00 
  f0:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# f7

f3: R_X86_64_PC32   .LC10-0x4
  f7:   c3  retq   
  f8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  ff:   00 
 100:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 107

103: R_X86_64_PC32  .LC9-0x4
 107:   c3  retq   
 108:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
 10f:   00 
 110:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 117

113: R_X86_64_PC32  .LC8-0x4
 117:   c3  retq   
 118:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
 11f:   00 
 120:   48 8d 05 00 00 00 00lea0x0(%rip

[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)

2018-01-23 Thread thgarnie at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

--- Comment #16 from Thomas Garnier  ---
Yes, I think you can't just default to the non-PIE mode.

Clang does it well though:

 :
   0:   83 ff 16cmp$0x16,%edi
   3:   77 0f   ja 14 
   5:   48 63 c7movslq %edi,%rax
   8:   48 8d 0d 00 00 00 00lea0x0(%rip),%rcx# f

b: R_X86_64_PC32.data.rel.ro-0x4
   f:   48 8b 04 c1 mov(%rcx,%rax,8),%rax
  13:   c3  retq   
  14:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 1b

17: R_X86_64_PC32   .L.str.23-0x4
  1b:   c3  retq

[Bug target/84011] New: Optimize switch table with -fPIE

2018-01-23 Thread thgarnie at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011

Bug ID: 84011
   Summary: Optimize switch table with -fPIE
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: thgarnie at google dot com
  Target Milestone: ---

Created attachment 43225
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43225&action=edit
Switch test case

With -O2 and -fPIE switch tables are never optimized as well as they could. I
assume that's to reduce the number of relocations in code sections for COW page
support. It makes sense for -fPIC but not -fPIE. The Linux kernel x86_64 PIE
prototype ends with big switch tables when it could be optimized to few lines.

With -fno-PIE -O2:

 :
   0:   b8 00 00 00 00  mov$0x0,%eax
1: R_X86_64_32  .rodata.str1.1
   5:   83 ff 16cmp$0x16,%edi
   8:   77 0a   ja 14 
   a:   89 ff   mov%edi,%edi
   c:   48 8b 04 fd 00 00 00mov0x0(,%rdi,8),%rax
  13:   00 
10: R_X86_64_32S.rodata
  14:   c3  retq 

With -fPIE -O2:

 :
   0:   83 ff 16cmp$0x16,%edi
   3:   0f 87 87 01 00 00   ja 190 
   9:   48 8d 15 00 00 00 00lea0x0(%rip),%rdx# 10

c: R_X86_64_PC32.rodata-0x4
  10:   89 ff   mov%edi,%edi
  12:   48 63 04 ba movslq (%rdx,%rdi,4),%rax
  16:   48 01 d0add%rdx,%rax
  19:   ff e0   jmpq   *%rax
  1b:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
  20:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 27

23: R_X86_64_PC32   .LC1-0x4
  27:   c3  retq   
  28:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  2f:   00 
  30:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 37

33: R_X86_64_PC32   .LC22-0x4
  37:   c3  retq   
  38:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  3f:   00 
  40:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 47

43: R_X86_64_PC32   .LC21-0x4
  47:   c3  retq   
  48:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  4f:   00 
  50:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 57

53: R_X86_64_PC32   .LC20-0x4
  57:   c3  retq   
  58:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  5f:   00 
  60:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 67

63: R_X86_64_PC32   .LC19-0x4
  67:   c3  retq   
  68:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  6f:   00 
  70:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 77

73: R_X86_64_PC32   .LC18-0x4
  77:   c3  retq   
  78:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  7f:   00 
  80:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 87

83: R_X86_64_PC32   .LC17-0x4
  87:   c3  retq   
  88:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  8f:   00 
  90:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# 97

93: R_X86_64_PC32   .LC16-0x4
  97:   c3  retq   
  98:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  9f:   00 
  a0:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# a7

a3: R_X86_64_PC32   .LC15-0x4
  a7:   c3  retq   
  a8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  af:   00 
  b0:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# b7

b3: R_X86_64_PC32   .LC14-0x4
  b7:   c3  retq   
  b8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  bf:   00 
  c0:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# c7

c3: R_X86_64_PC32   .LC13-0x4
  c7:   c3  retq   
  c8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  cf:   00 
  d0:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# d7

d3: R_X86_64_PC32   .LC12-0x4
  d7:   c3  retq   
  d8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  df:   00 
  e0:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# e7

e3: R_X86_64_PC32   .LC11-0x4
  e7:   c3  retq   
  e8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  ef:   00 
  f0:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# f7

f3: R_X86_64_PC32   .LC10-0x4
  f7:   c3  

[Bug target/82303] Better PIE/PIC code generation for kernel code (x86_64 & arm64)

2018-01-23 Thread thgarnie at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

--- Comment #18 from Thomas Garnier  ---
Ok. Opened: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011

[Bug target/82303] New: Better PIE/PIC code generation for kernel code (x86_64 & arm64)

2017-09-22 Thread thgarnie at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

Bug ID: 82303
   Summary: Better PIE/PIC code generation for kernel code (x86_64
& arm64)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: thgarnie at google dot com
  Target Milestone: ---

The current PIE/PIC code generation is not optimal for kernel code.

It makes inferences about the execution environment which do not hold for
freestanding executables such as the Linux kernel, regarding the need to avoid
text relocations, to minimize the footprint of CoWed pages, and to always refer
to exported symbols via the GOT so they can be preempted. None of these
concerns apply to freestanding binaries.

Having a separate flag (like mcmodel=kernel-pie or -fkernel-pie) would allow
better code optimization for PIE/PIC kernel code, notably:

- Select the right segment register for TLS on kernel code (For example x86_64
use gs instead of fs [1]).
- No need for GOT or PLT.
- Re-enable code optimizations disabled for COW pages support, trying to reduce
relocations to code sections. For example, switch are not folded for PIE/PIC
code to avoid relocations [2].

Note that arm64 PIE uses the small or tiny mcmodel based on UEFI so it should
be taken in considerations for this architecture.

For reference the discussion on Linux kernel x86_64 PIE RFC:
http://www.openwall.com/lists/kernel-hardening/2017/09/21/16

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
[2]
https://github.com/gcc-mirror/gcc/blob/7977b0509f07e42fbe0f06efcdead2b7e4a5135f/gcc/tree-switch-conversion.c#L828