[Bug target/95285] New: AArch64:aarch64 medium code model proposal

bule1 at huawei dot com Fri, 22 May 2020 23:41:30 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95285


            Bug ID: 95285
           Summary: AArch64:aarch64 medium code model proposal
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bule1 at huawei dot com
  Target Milestone: ---

Created attachment 48584
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48584&action=edit
proposed patch

I would like to propose an implementation of the medium code model in aarch64.
A prototype is attached, passed bootstrap and the regression test.

Mcmodel = medium is a missing code model in aarch64 architecture, which is
supported in x86. This code model describes a situation that some small data is
relocated by small code model while large data is relocated by large code
model. The official statement about medium code model in x86 ABI file page 34
URL : https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf

The key difference between x86 and aarch64 is that x86 can use lea+movabs
instruction to implement a dynamic relocatable large code model. Currently,
large code model in AArch64 relocate the symbol using ldr instruction, which
can only be static linked. However, the small code mode use adrp + ldr
instruction, which can be dynamic linked. Therefore, the medium code model
cannot be implemented directly by simply setting a threshold. As a result a
dynamic reloadable large code model is needed first for a functional medium
code model.

I met this problem when compiling CESM, which is a climate forecast software
that widely used in hpc field. In some configure case, when the manipulating
large arrays, the large code model with dynamic relocation is needed. The
following case is abstract from CESM for this scenario.

program main
 common/baz/a,b,c
 real a,b,c
 b = 1.0
 call foo()
 print*, b
 end

 subroutine foo()
 common/baz/a,b,c
 real a,b,c

 integer, parameter :: nx = 1024
 integer, parameter :: ny = 1024
 integer, parameter :: nz = 1024
 integer, parameter :: nf = 1
 real :: bar(nf,nx*ny*nz)
 real :: bar1(nf,nx*ny*nz)
 bar = 0.0
 bar1 =0.0
 b = bar(1,1024*1024*100)
 b = bar1(1,1)

 return
 end

compile with -mcmodel=small -fPIC will give following error due to the access
of bar1 array
test.f90:(.text+0x28): relocation truncated to fit:
R_AARCH64_ADR_PREL_PG_HI21 against `.bss'
test.f90:(.text+0x6c): relocation truncated to fit:
R_AARCH64_ADR_PREL_PG_HI21 against `.bss'

compile with -mcmodel=large -fPIC will give unsupported error:
f951: sorry, unimplemented: code model ‘large’ with ‘-fPIC’

As discussed in the beginning, to tackle this problem we have to solve the
static large code model problem. My solution here is to use
R_AARCH64_MOVW_PREL_Gx group relocation with instructions to calculate the
current PC value.

Before change (mcmodel=small) :
adrp    x0, bar1.2782
add     x0, x0, :lo12:bar1.2782

After change:(mcmodel = medium proposed):
movz    x0, :prel_g3:bar1.2782
movk    x0, :prel_g2_nc:bar1.2782
movk    x0, :prel_g1_nc:bar1.2782
movk    x0, :prel_g0_nc:bar1.2782
adr     x1, .
sub     x1, x1, 0x4
add     x0, x0, x1

The first 4 movk instruction will calculate the offset between bar1 and the
last movk instruction in 64-bits, which fulfil the requirement of large code
model(64-bit relocation).
The adr+sub instruction will calculate the pc-address of the last movk
instruction. By adding the offset with the PC address, bar1 can be dynamically
located.

Because this relocation is time consuming, a threshold is set to classify the
size of the data to be relocated, like x86. The default value of the threshold
is set to 65536, which is max relocation capability of small code model.
This implementation will also need to amend the linker in binutils so that the4
movk can calculated the same pc-offset of the last movk instruction.

The good side of this implementation is that it can use existed relocation type
to prototype a medium code model.

The drawback of this implementation also exists. 
For start, these 4movk instructions and the adr instruction must be combined in
this order. No other instruction should insert in between the sequence, which
will leads to mistake symbol address. This might impede the insn schedule
optimizations. 
Secondly, the linker need to make the change correspondingly so that every mov
instruction calculate the same pc-offset. For example, in my implementation,
the fisrt movz instruction will need to add 12 to the result of
":prel_g3:bar1.2782" to make up the pc-offset.   

I haven't figure out a suitable solution for these problems yet. You are most
welcomed to leave your suggestions regarding these issues.

[Bug target/95285] New: AArch64:aarch64 medium code model proposal

Reply via email to