[Bug target/56592] New: [SH] Add vector ABI

olegendo at gcc dot gnu.org Sun, 10 Mar 2013 16:43:51 -0700


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56592




             Bug #: 56592

           Summary: [SH] Add vector ABI

    Classification: Unclassified

           Product: gcc

           Version: 4.8.0

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: target

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: olege...@gcc.gnu.org

                CC: kkoj...@gcc.gnu.org

            Target: sh*-*-*





On SH there are a couple of ABI related issues which unfortunately can't be all

fixed without breaking binary compatibility.  Thus the idea to add a new ABI

which can be selected by a target -mabi=vector option.  Already existing ABIs

could also be selected based on this option:

-mrenesas -> -mabi=renesas

-mnorenesas -> -mabi=gnu





Some of the primary issues that the vector ABI is supposed to improve are:



----------------------

PR 13423

sh-elf: V4SFmode passed in integer registers



float vectors, float arrays (of fixed size) or structs of floats when passed by

value should be passed in FP regs entirely.  The current ABI allows passing of

up to 8 FP regs (FR4..FR11), so there would be space to pass two 4D float

vectors.  It should also be possible to return a 4D float vectors in registers.

Since FR0..FR11 are call clobbered, they can as well be used to return multiple

vectors.



----------------------

PR 53513

SH Target: Add support for fschg and fpchg insns



Although this PR could be solved without breaking the ABI too much, there are

some issues which could be fixed in a new ABI.

The current approach is to use two global variables (__fpscr_values) in order

to perform FPU single/double mode switching.  The default FPU precision setting

is defined by an -m option.  Currently there are three such FPU default modes:

- double mode default

- single mode default

- single mode only



When changing the FPU mode the current FPSCR setting is overwritten with one of

the global values from __fpscr_values.  This is the fastest way (on non-SH4A)

to perform a mode switch, but it has some disadvantages.  One of them is PR

6526.  In general all information in FPSCR is lost after performing a mode

switch this way, e.g. it is not possible to read FPU exception causes after a

series of operations.  Moreover, in multi-threaded environments it is not

possible to set the default FPSCR setting (e.g. rounding mode or denormal

handling) for threads independently.  In order to minimize mode switches the

function signature can be taken into account when deciding the default FPU

precision for a particular function.  E.g. when a function has any double

precision arguments, it can be assumed that the function will use the double

values in some way.  Thus the default entry mode for such a function should be

'double'.  Similarly, for functions that return double values it can be better

to leave the function with 'double' mode.



Because of this, '-m4 -mvabi' and '-m4-single -mvabi' would actually result in

the same ABI.



It should also be possible to override the FPSCR.PR settings for function entry

and function leave via function attributes.  This can be useful e.g. in cases

where hand written asm FPU routines are invoked from C/C++ code that expect

certain settings.  E.g. code that uses the 'frchg' insn to flip FPSCR.FR bit on

SH4 must be executed with FPSCR.PR = 0.





----------------------

PR 52441

Target: Double sign/zero extensions for function arguments



Values that are passed in registers that are < 32 bit in size have usually

undefined high bits.  The standard GNU calling convention thus performs

sign/zero extension of such values before the function call and inside the

function itself.  The Renesas calling convention (-mrenesas) however only

extends values inside the function.  Whether an extension is actually required

at all depends on how the value is used.  This is known only inside of a

function.  Thus adopting the Renesas calling convention in this case is more

efficient.





----------------------

Register ordering for arguments.



I don't remember in which PR this was mentioned but the current GNU calling

convention allocates FR registers on big endian like:

FR4 = arg0

FR5 = arg1

FR6 = arg2

FR7 = arg3

...



and on little endian:

FR4 = arg1

FR5 = arg0

FR6 = arg3

FR7 = arg2

...



This can make writing endian neutral asm code more complicated.  The ordering

for little endian should be the same as for big endian (which is also

equivalent to the -mrenesas ABI).





----------------------

Alignment of double precision FP values.



Currently the default alignment for those is 32 bit and can be changed to 64

bit by the option -mdalign.  In order to be able to maximize the utilization of

64 bit fmov insns, 64 bit double alignment should be the default.





----------------------

Boolean function return values



A boolean return value of a function tends to be produced inside the function

by using some sort of comparison insns which store the comparison result in the

T bit.  The T bit is then transferred to a GP reg before returning from the

function.  On the caller side, the value in the GP reg is then often tested for

!= 0 followed by a conditional branch.  The redundant != 0 test can be

eliminated by returning boolean values in the T bit directly.  However, there

might be compatibility problems with C code that typedefs its own bool type as

signed/unsigned char or something else.





----------------------

Variadic functions



Passing variable number of arguments ('...') over the stack as it is currently

done with -mrenesas tends to produce more efficient code, especially when

traversing the va_list .





----------------------

ABI summary I've got so far





R0..R3:      Call-clobbered.

             Function return values / scratch registers.

             High bits of values < 32 bit are undefined.





R4..R7:      Call-clobbered.

             Function arguments / scratch registers.

             High bits of values < 32 bit are undefined.





R8..R15:     Call-saved.



             R15: stack pointer

             R14: frame pointer (optional)

             R12: GOT pointer (optional, for PIC code)





PR:          Call-saved.

             Function return address.





SR.S:        '0' (MAC saturation disabled) at function entry and function

leave.



SR.T:        Call-clobbered.

             Boolean return value.



SR.M, SR.Q:  Call-clobbered.



Other SR bits: Ignored by the compiler.



GBR:         Call-saved.

             Pointer to current execution context (thread).



MACL,MACH:   Call-clobbered.

             Scratch registers.



FPUL:        Call-clobbered.

             Scratch register.



FR0..FR3:    Call-clobbered.

             Function return values / scratch registers.



FR4..FR7:    Call-clobbered.

             Function arguments / return values / scratch registers.



FR8..FR11:   Call-clobbered.

             Function arguments / scratch registers.



FR12..FR15:  Call-saved.

             Local variables.



XF0..XF15:   Undefined, not modified by compiler generated code.



FPSCR.FR:    Undefined, not modified by compiler generated code.



FPSCR.SZ:    '0' (32 bit fmov) on function entry / leave by default.



FPSCR.PR:    Function entry:

             '0' (single precision) if the function takes no floating point

             arguments, or if the number of 'float' arguments is greater than

             the number of 'double' arguments, '1' otherwise.



             Function leave:

             Unmodified if the function returns 'void' or integral values or

             aggregates.

             '0' if the function returns more 'float' values than 'double'

             values, '1' otherwise.



             '0' on exception handler entry.



Other FPSCR bits: Undefined, not modified by compiler generated code.





When counting the number of 'float' and 'double' values elements of vectors are

counted as individual values.  I.e. a 4D 'float' vector has more 'float' values

than a 2D 'double' vector has 'double' values.  va_args are ignored.




Function argument/return value aggregates are decomposed so that the individual

members can be passed in different register classes, based on the data type. 

E.g. 



struct FuncArg

{

  int a;     // -> r4

  int b;     // -> r5

  float c;   // -> fr4

};



struct FuncArg

{

  int a;     // -> r4

  int b;     // -> r5

  float c;   // -> fr4

  double d;  // -> dr6 (fr6:fr7)

  bool e;    // -> T

  float f;   // -> fr5

};



struct FuncArg

{

  int a;     // -> r4

  int b;     // -> r5

  int c;     // -> r6

  int d;     // -> r7

};



struct FuncArg

{

  int a;        // -> r4

  int b;        // -> r5

  int c;        // -> r6

  long long d;  // -> stack

  short e;      // -> r7

};



struct FuncArg

{

  float a;      // -> fr4

  float b;      // -> fr5

  float c;      // -> fr6

  float d;      // -> fr7

};





Return values/aggregates that don't fit into registers are returned partially

in registers and partially onto the caller's stack.  In this case R2 is used to

pass the hidden pointer to the remaining return values.



Argument aggregates that don't fit into registers are passed partially in

registers and the remaining pieces are pushed onto the stack.



va_args are passed on the stack entirely (simpler traversal of va_list).



'double' values are passed in DR registers, where the high 32 bits are passed

in FR(n*2) and the low 32 bits in FR(n*2+1) regardless of the endian setting.



4D 'float' vectors are passed in FV registers, i.e. FR(n*4), in order to avoid

reg copies before vector insns (fipr, ftrv).



SH targets that don't support double precision floating-point in hardware

handle the operations in software, but should accept the same ABI otherwise. 

This would fix e.g. PR 36939.





I'm not sure how to integrate untyped calls and whether this kind of ABI would

require additional extensions to GDB.  Probably there are also lots of other

details missing for this to be a complete ABI definition.  Any suggestions and

feedback is highly appreciated.

[Bug target/56592] New: [SH] Add vector ABI

Reply via email to