[go-nuts] Re: Performance: Restrictions on arguments in registers in SSA implementation

Andrey Bokhanko Thu, 09 Jan 2025 03:58:19 -0800

+1 that spills (and moving structures from registers to stack is 
essentially spills) should be decided by a specialized pass (register 
allocator), not during SSA build. Relying on a heuristic to do 
"pre-spilling" is, IMHO, a design flaw.


@Arseny Samoylov, I wonder if you tried to disable this restriction and see 
what happens (performance-wise)?

Two more ideas to try (as a kind of "middle ground"):
1) Take total number of variables / big structures into account in "CanSSA"
2) For variables that are function's arguments, set ABI limitations (what 
structures ABI allows to pass via registers on the target platform) rather 
than plain 4*Widthptr as the limitation in "CanSSA". This should solve 
Arseny's case.

Yours,
Andrey


среда, 25 декабря 2024 г. в 16:38:53 UTC+3, Arseny Samoylov: 

> Hello, thank you for your response. 
>
> I understand the concern about how many registers a single variable 
> consumes. However, I don’t fully understand why this affects SSA, or why we 
> preemptively decide to spill structures that are already laid out in 
> registers. As far as I understand, this should be a concern for the 
> register allocator, not earlier in the process.
>
> > In your example, those spills are either dead or kind of silly.
>
> Exactly! That's why I provided them =). The example with the Getter 
> function is my main point because it's a pretty common pattern.
>
> Just to clarify, here’s the example I mentioned earlier as a reminder:
> ```
> type A struct {
> s1, s2 string
> i1     int64
> }
>
> func (a A) GetInt() int64 {
> return a.i1
> }
> ```
>
> This compiles to:
>
> ```
> f90007e0                MOVD R0, 8(RSP)
> f9000be1                MOVD R1, 16(RSP)
> f9000fe2                 MOVD R2, 24(RSP)
> f90013e3                MOVD R3, 32(RSP)
> f90017e4                MOVD R4, 40(RSP)
> aa0403e0               MOVD R4, R0
> d65f03c0                RET
> ```
> On Tuesday, 17 December 2024 at 22:49:36 UTC+3 Keith Randall wrote:
>
>> I think most of what you are seeing is a mismatch between how a big 
>> struct is passed in the calling convention and how it is processed within a 
>> function by ssa.
>>
>> The calling convention lets larger structs be broken up and put in 
>> registers, if there are enough argument registers for it (which is an 
>> arch-dependent thing).
>> The total set of registers used is fixed, and those registers really 
>> can't be used for anything else at the call point, so there's no danger in 
>> overusing them.
>>
>> Inside a function, we can have many more such structs and there's no 
>> obvious way to pick which ones get registers and which don't.
>>
>> `type T struct { a,b,c,d,e int }`
>> `func f(x,y,z,p,q T) {}`
>>
>> Here it's obvious how to allocate registers. some prefix of the argument 
>> list gets registers, the rest don't.
>> There's a fixed set of spill instructions needed to handle the rest.
>>
>> Whereas if we had
>> `
>> func f() {
>>    var x,y,z,p,q T
>>    ...
>> }
>> `
>> How do we decide which (parts of) variables get registers? How does that 
>> compete with other, non-large-struct register demands?
>> Because we don't have great answers to these questions, we want to be 
>> significantly more conservative in how many registers we let a single 
>> variable consume.
>>
>> All that said, I'm sure there are cases where we could do better. In your 
>> example, those spills are either dead or kind of silly.
>> On Thursday, December 12, 2024 at 6:13:23 AM UTC-8 Arseny Samoylov wrote:
>>
>>> If we're concerned about register pressure, perhaps we should look at 
>>> the total number of registers taken by arguments rather than just the size 
>>> of the arguments. Consider the following example:
>>>
>>> ```
>>> type MegaInt struct {
>>> i1, i2, i3, i4, i5 int64
>>> }
>>>
>>> func foo(i1, i2, i3, i4, i5 int64) int64 {
>>> return i1 + i2 + i3 + i4 + i5
>>> }
>>>
>>> func bar(i MegaInt) int64 {
>>> return i.i1 + i.i2 + i.i3 + i.i4 + i.i5
>>> }
>>> ```
>>>
>>> This compiles to:
>>> ```
>>> TEXT command-line-arguments.foo(SB) 
>>> 8b000021                ADD R0, R1, R1
>>> 8b010041                ADD R1, R2, R1
>>> 8b010061                ADD R1, R3, R1
>>> 8b010080                ADD R1, R4, R0
>>> d65f03c0                 RET
>>>
>>> TEXT command-line-arguments.bar(SB)
>>>   f90007e0                MOVD R0, 8(RSP)
>>>   f9000be1                MOVD R1, 16(RSP)
>>>   f9000fe2                 MOVD R2, 24(RSP)
>>>   f90013e3                MOVD R3, 32(RSP)
>>>   f90017e4                MOVD R4, 40(RSP)
>>>   f94007e5                MOVD 8(RSP), R5
>>>   8b0100a1               ADD R1, R5, R1
>>>   8b010041               ADD R1, R2, R1
>>>   8b010061               ADD R1, R3, R1
>>>   8b010080               ADD R1, R4, R0
>>>   d65f03c0                RET
>>> ```
>>> On Thursday, 12 December 2024 at 12:53:50 UTC+3 Arseny Samoylov wrote:
>>>
>>>> Hi everybody!
>>>>
>>>> Recently, I noticed that there are some restrictions on the arguments 
>>>> passed to functions in registers.
>>>>
>>>> For example, if `a` is a struct, it must have fewer than 5 fields, and 
>>>> its size must be less than `5 * ptrsz`. You can find these restrictions in 
>>>> `cmd/compile/internal/ssa/value.go` at line 590 in the `CanSSA` function:
>>>>
>>>> ```
>>>> // CanSSA reports whether values of type t can be represented as a 
>>>> Value.
>>>> func CanSSA(t *types.Type) bool {
>>>> types.CalcSize(t)
>>>> if t.Size() > int64(4*types.PtrSize) {
>>>> // 4*Widthptr is an arbitrary constant. We want it
>>>> // to be at least 3*Widthptr so slices can be registerized.
>>>> // Too big and we'll introduce too much register pressure.
>>>> return false
>>>> }
>>>> switch t.Kind() {
>>>> ...
>>>> case types.TSTRUCT:
>>>> if t.NumFields() > MaxStruct { // MaxStruct = 4
>>>> return false
>>>> }
>>>> }
>>>> }
>>>> ```
>>>>
>>>> Consider the following example:
>>>>
>>>> ```
>>>> type A struct {
>>>> s1, s2 string
>>>> i1     int64
>>>> }
>>>>
>>>> func (a A) GetInt() int64 {
>>>> return a.i1
>>>> }
>>>> ```
>>>>
>>>> This compiles to:
>>>>
>>>> ```
>>>> f90007e0                MOVD R0, 8(RSP)
>>>> f9000be1                MOVD R1, 16(RSP)
>>>> f9000fe2                MOVD R2, 24(RSP)
>>>> f90013e3                MOVD R3, 32(RSP)
>>>> f90017e4                MOVD R4, 40(RSP)
>>>> aa0403e0                MOVD R4, R0
>>>> d65f03c0                RET
>>>> ```
>>>>
>>>> In the recent merged changes (CL#611075)[
>>>> https://go-review.googlesource.com/c/go/+/611075/4] and (CL#611076)[
>>>> https://go-review.googlesource.com/c/go/+/611076/6], support was added 
>>>> for making structs with any number of fields SSA-able. With these changes, 
>>>> I was able to remove the size restriction for structs that can be SSA-ized.
>>>>
>>>> Without these restrictions, the above example compiles to:
>>>>
>>>> ```
>>>> f90007e0                MOVD R0, 8(RSP)
>>>> f9000fe2                MOVD R2, 24(RSP)
>>>> aa0403e0                MOVD R4, R0
>>>> d65f03c0                RET
>>>> ```
>>>>
>>>> So, I am wondering: why does the restriction on size exist in the first 
>>>> place? It seems unreasonable to place the argument in registers only to 
>>>> later push it to the stack. The comment mentions that it helps reduce 
>>>> register pressure, but can't the register allocator decide to spill the 
>>>> argument if necessary? Also, if we’re preemptively pushing the structure 
>>>> to 
>>>> the stack, why not just pass it on the stack from the beginning?
>>>>
>>>> Thank you for your time and attention,  
>>>> Arseny.
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/37a6b088-1260-4a3d-8123-c459d43b606cn%40googlegroups.com.

[go-nuts] Re: Performance: Restrictions on arguments in registers in SSA implementation

Reply via email to