[go-nuts] Re: Performance: Restrictions on arguments in registers in SSA implementation

'Keith Randall' via golang-nuts Tue, 17 Dec 2024 11:49:54 -0800

I think most of what you are seeing is a mismatch between how a big struct 
is passed in the calling convention and how it is processed within a 
function by ssa.


The calling convention lets larger structs be broken up and put in 
registers, if there are enough argument registers for it (which is an 
arch-dependent thing).
The total set of registers used is fixed, and those registers really can't 
be used for anything else at the call point, so there's no danger in 
overusing them.

Inside a function, we can have many more such structs and there's no 
obvious way to pick which ones get registers and which don't.

`type T struct { a,b,c,d,e int }`
`func f(x,y,z,p,q T) {}`

Here it's obvious how to allocate registers. some prefix of the argument 
list gets registers, the rest don't.
There's a fixed set of spill instructions needed to handle the rest.

Whereas if we had
`
func f() {
   var x,y,z,p,q T
   ...
}
`
How do we decide which (parts of) variables get registers? How does that 
compete with other, non-large-struct register demands?
Because we don't have great answers to these questions, we want to be 
significantly more conservative in how many registers we let a single 
variable consume.

All that said, I'm sure there are cases where we could do better. In your 
example, those spills are either dead or kind of silly.
On Thursday, December 12, 2024 at 6:13:23 AM UTC-8 Arseny Samoylov wrote:

> If we're concerned about register pressure, perhaps we should look at the 
> total number of registers taken by arguments rather than just the size of 
> the arguments. Consider the following example:
>
> ```
> type MegaInt struct {
> i1, i2, i3, i4, i5 int64
> }
>
> func foo(i1, i2, i3, i4, i5 int64) int64 {
> return i1 + i2 + i3 + i4 + i5
> }
>
> func bar(i MegaInt) int64 {
> return i.i1 + i.i2 + i.i3 + i.i4 + i.i5
> }
> ```
>
> This compiles to:
> ```
> TEXT command-line-arguments.foo(SB) 
> 8b000021                ADD R0, R1, R1
> 8b010041                ADD R1, R2, R1
> 8b010061                ADD R1, R3, R1
> 8b010080                ADD R1, R4, R0
> d65f03c0                 RET
>
> TEXT command-line-arguments.bar(SB)
>   f90007e0                MOVD R0, 8(RSP)
>   f9000be1                MOVD R1, 16(RSP)
>   f9000fe2                 MOVD R2, 24(RSP)
>   f90013e3                MOVD R3, 32(RSP)
>   f90017e4                MOVD R4, 40(RSP)
>   f94007e5                MOVD 8(RSP), R5
>   8b0100a1               ADD R1, R5, R1
>   8b010041               ADD R1, R2, R1
>   8b010061               ADD R1, R3, R1
>   8b010080               ADD R1, R4, R0
>   d65f03c0                RET
> ```
> On Thursday, 12 December 2024 at 12:53:50 UTC+3 Arseny Samoylov wrote:
>
>> Hi everybody!
>>
>> Recently, I noticed that there are some restrictions on the arguments 
>> passed to functions in registers.
>>
>> For example, if `a` is a struct, it must have fewer than 5 fields, and 
>> its size must be less than `5 * ptrsz`. You can find these restrictions in 
>> `cmd/compile/internal/ssa/value.go` at line 590 in the `CanSSA` function:
>>
>> ```
>> // CanSSA reports whether values of type t can be represented as a Value.
>> func CanSSA(t *types.Type) bool {
>> types.CalcSize(t)
>> if t.Size() > int64(4*types.PtrSize) {
>> // 4*Widthptr is an arbitrary constant. We want it
>> // to be at least 3*Widthptr so slices can be registerized.
>> // Too big and we'll introduce too much register pressure.
>> return false
>> }
>> switch t.Kind() {
>> ...
>> case types.TSTRUCT:
>> if t.NumFields() > MaxStruct { // MaxStruct = 4
>> return false
>> }
>> }
>> }
>> ```
>>
>> Consider the following example:
>>
>> ```
>> type A struct {
>> s1, s2 string
>> i1     int64
>> }
>>
>> func (a A) GetInt() int64 {
>> return a.i1
>> }
>> ```
>>
>> This compiles to:
>>
>> ```
>> f90007e0                MOVD R0, 8(RSP)
>> f9000be1                MOVD R1, 16(RSP)
>> f9000fe2                MOVD R2, 24(RSP)
>> f90013e3                MOVD R3, 32(RSP)
>> f90017e4                MOVD R4, 40(RSP)
>> aa0403e0                MOVD R4, R0
>> d65f03c0                RET
>> ```
>>
>> In the recent merged changes (CL#611075)[
>> https://go-review.googlesource.com/c/go/+/611075/4] and (CL#611076)[
>> https://go-review.googlesource.com/c/go/+/611076/6], support was added 
>> for making structs with any number of fields SSA-able. With these changes, 
>> I was able to remove the size restriction for structs that can be SSA-ized.
>>
>> Without these restrictions, the above example compiles to:
>>
>> ```
>> f90007e0                MOVD R0, 8(RSP)
>> f9000fe2                MOVD R2, 24(RSP)
>> aa0403e0                MOVD R4, R0
>> d65f03c0                RET
>> ```
>>
>> So, I am wondering: why does the restriction on size exist in the first 
>> place? It seems unreasonable to place the argument in registers only to 
>> later push it to the stack. The comment mentions that it helps reduce 
>> register pressure, but can't the register allocator decide to spill the 
>> argument if necessary? Also, if we’re preemptively pushing the structure to 
>> the stack, why not just pass it on the stack from the beginning?
>>
>> Thank you for your time and attention,  
>> Arseny.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/913177ad-c7a1-49be-9251-4dca98071f78n%40googlegroups.com.

[go-nuts] Re: Performance: Restrictions on arguments in registers in SSA implementation

Reply via email to