Hello community, recently I found that gc generates a lot of JMP to RET
instructions and there is no optimization for that. Consider this example:
```
// asm_arm64.s
#include "textflag.h"
TEXT ·jmp_to_ret(SB), NOSPLIT, $0-0
JMP *ret*
ret:
*RET*
*```*
This compiles to :
```
TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s
asm_arm64.s:4 0x77530 14000001 JMP
1(PC)
asm_arm64.s:6 0x77534 d65f03c0 RET
```
Obviously, it can be optimized just to RET instruction.
So I made a patch that replaces JMP to RET with RET instruction (on Prog
representation):
```
diff --git a/src/cmd/internal/obj/pass.go b/src/cmd/internal/obj/pass.go
index 066b779539..87f1121641 100644
--- a/src/cmd/internal/obj/pass.go
+++ b/src/cmd/internal/obj/pass.go
@@ -174,8 +174,16 @@ func linkpatch(ctxt *Link, sym *LSym, newprog
ProgAlloc) {
continue
}
p.To.SetTarget(brloop(p.To.Target()))
- if p.To.Target() != nil && p.To.Type == TYPE_BRANCH {
- p.To.Offset = p.To.Target().Pc
+ if p.To.Target() != nil {
+ if p.As == AJMP && p.To.Target().As == ARET {
+ p.As = ARET
+ p.To = p.To.Target().To
+ continue
+ }
+
+ if p.To.Type == TYPE_BRANCH {
+ p.To.Offset = p.To.Target().Pc
+ }
}
}
}
```
You can find this patch on my GH
<https://github.com/ArsenySamoylov/go/tree/obj-linkpatch-jmp-to-ret>.
I encountered few problems:
* Increase in code size - because RET instruction can translate in multiple
instructions (ldp, add, and ret - on arm64 for example):
.text section of simple go program that calls function from above increases
in 0x3D0 bytes; go binary itself increases in 0x2570 (almost 10KB) in .text
section size
(this is for arm64 binaries)
* Optimization on Prog representation is too late, and example above
translates to:
```
TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s
asm_arm64.s:4 0x77900 d65f03c0 RET
asm_arm64.s:6 0x77904 d65f03c0 RET
```
(no dead code elimination was done =( )
So I am looking for some ideas. Maybe this optimization should be done on
SSA form and needs some heuristics (to avoid increase in code size).
And also I would like to have suggestion where to benchmark my
optimization. Bent benchmark is tooooo long =(.
Ps: example of JMP to RET from runtime:
```
TEXT runtime.strequal(SB) a/go/src/runtime/alg.go
…
alg.go:378 0x12eac 14000004 JMP
4(PC) // JMP to RET in Prog
alg.go:378 0x12eb0 f9400000
MOVD (R0), R0
alg.go:378 0x12eb4 f9400021
MOVD (R1), R1
alg.go:378 0x12eb8 97fffc72
CALL runtime.memequal(SB)
alg.go:378 0x12ebc a97ffbfd LDP
-8(RSP), (R29, R30)
alg.go:378 0x12ec0 9100c3ff ADD
$48, RSP, RSP
alg.go:378 0x12ec4 d65f03c0 RET
...
```
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/e104bb48-acd9-420f-a28e-620f5829eb96n%40googlegroups.com.