Hi Steve, > This patch checks for SIMD functions and saves the extra registers when > needed. It does not change the caller behavour, so with just this patch > there may be values saved by both the caller and callee. This is not > efficient, but it is correct code.
I tried a few simple test cases. It seems calls to non-vector functions don't mark the callee-saves as needing to be saved/restored: void g(void); void __attribute__ ((aarch64_vector_pcs)) f1 (void) { g(); g(); } f1: str x30, [sp, -16]! bl g ldr x30, [sp], 16 b g Here I would expect q8-q23 to be preserved and no tailcall to g() since it is not a vector function. This is important for correctness since f1 must preserve q8-q23. // compile with -O2 -ffixed-d1 -ffixed-d2 -ffixed-d3 -ffixed-d4 -ffixed-d5 -ffixed-d6 -ffixed-d7 float __attribute__ ((aarch64_vector_pcs)) f2 (float *p) { float t0 = p[1]; float t1 = p[3]; float t2 = p[5]; return t0 - t1 * (t1 + t0) + (t2 * t0); } f2: stp d16, d17, [sp, -48]! ldr s17, [x0, 4] ldr s18, [x0, 12] ldr s0, [x0, 20] fadd s16, s17, s18 fmsub s16, s16, s18, s17 fmadd s0, s17, s0, s16 ldp d16, d17, [sp], 48 ret This uses s16-s18 when it should prefer to use s24-s31 first. Also it needs to save q16-q18, not only d16 and d17. Btw the -ffixed-d* is useful to block the register allocator from using certain registers. Wilco