On 18/04/12 18:36, Ulrich Weigand wrote: > > Hello, > > I've been following up on the discussion we had on Monday regarding stack > alignment, and noticed that I had mis-remembered the current state of > affairs. Ramana asked me on Tuesday to provide a write-up of the actual > status, so here we go ... > > > To summarize the background of the problem: on ARM, the incoming stack > pointer is only guaranteed to be aligned to an 8 byte boundary. This means > that objects on the stack (local variables, spill slots, temporaries etc.) > cannot easily be aligned to more than 8 bytes. This can potentially cause > problems in two situations: > > 1) The object's default alignment (according to its type) is larger than 8 > bytes > 2) The object has a forced non-default alignment that is larger than 8 > bytes > > The first situation should in theory never appear, since according to the > ARM ABI all types have a default alignment of at most 8 bytes. However, > due to the current mix-up in GCC, vector types actually are considered to > have a 16-byte alignment requirement in GCC. >
> The second situation can only appear with local variables that are declared > using attribute ((aligned)). > > > We had discussed on Monday that we need to fix the second situation, since > this can always occur and is supported on other platforms. By doing so, > we would then automatically fix the first situation as well. > > However, this reasoning turns out to be incorrect. There are currently in > GCC *two* completely separate mechanisms that can be used to align objects > on the stack to larger than the ABI guaranteed stack pointer alignment: > > A) Re-alignment of the full stack frame. This is what is used by the Intel > back-end (and only the Intel back-end). At function entry, generated code > will align the stack pointer itself to whatever is necessary to fulfil > alignment requirements of all objects on the stack. This may necessitate > follow-on changes: the frame pointer, if there is one, will likewise need > to be aligned at runtime. Also, since incoming stack arguments are now no > longer at a fixed offset relative to the stack pointer *or* frame pointer > in some cases, we might need an extra register as argument pointer. This > method allows extra alignment for *any* object on the stack, but needs > significant back-end support in order to be enabled on any non-Intel > architecture. > > B) Dynamic allocation of selected stack variables. This is implemented by > common code with no involvement of the back-end. In effect, the code in > cfgexpand.c:expand_stack_vars that decides on how to allocate local > variables on the stack will remove all variables that require extra > alignment and place them into an extra structure. Generated prologue code > will then in effect dynamically allocate and align that structure on the > stack, and just store a pointer to it as "variable" into the normal stack > frame. All other areas of the frame are unaffected. Since this method > just simulates code the programmer could have written themselves using > alloca, it does not require *any* back-end support and is enabled by > default everywhere. However, it only works for regular local variables, > and not for any other objects on the stack. I read the C11 standard briefly a few months back, and I believe that B) is all that is needed there. The standard excludes over-aligning function arguments. > > Objects on the stack *except* local variables always use default alignment. > Since on most platforms, except Intel and *currently* ARM, the ABI stack > pointer alignment is sufficient to implement default alignments, method B) > as above is able to fulfil all stack alignments. Intel uses method A), so > they're also OK. In effect, it's only ARM due to the vector type > alignment problem that runs into the situation that neither method works. > > > Under those circumstances, given that: > - we want to fix vector type alignment in order to become ABI compliant > - once we've fixed this, we're in the same situation as other platforms and > method B) already fixes stack alignment problems > - implementing method A) is therefore both quite involved *and* actually > superfluous > > I'd now rather recommend that we *don't* try to implement method A) (full > stack-frame re-alignment) on ARM. > > Comments? > Yes, sounds like the right solution to me. Technically, GCC's vector mechanism allows the creation of any size of vector, which will be aligned to the size of the vector. We only run into problems when that size exceeds the maximum alignment. Such values passed by value to functions should also be over-aligned. I think if we were to continue supporting such non-standard types we would have to change the rules to pass them by reference and have caller copying. We'd still need to deal with the 16-byte vectors somehow though. So overall, I think the only practical solution is to limit vectors to 8-byte alignment. R. > > Mit freundlichen Gruessen / Best Regards > > Ulrich Weigand > > -- > Dr. Ulrich Weigand | Phone: +49-7031/16-3727 > STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. > IBM Deutschland Research & Development GmbH > Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk > Wittkopp > Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht > Stuttgart, HRB 243294 > > _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain