On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <tom_devr...@mentor.com> wrote: > On 09/28/2011 11:53 AM, Richard Guenther wrote: >> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <tom_devr...@mentor.com> >> wrote: >>> Richard, >>> >>> I got a patch for PR50527. >>> >>> The patch prevents the alignment of vla-related allocas to be set to >>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding >>> the alloca. >>> >>> Bootstrapped and regtested on x86_64. >>> >>> OK for trunk? >> >> Hmm. As gfortran with -fstack-arrays uses VLAs it's probably bad that >> the vectorizer then will no longer see that the arrays are properly aligned. >> >> I'm not sure what the best thing to do is here, other than trying to record >> the alignment requirement of the VLA somewhere. >> >> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT >> has the issue that it will force stack-realignment which isn't free (and the >> point was to make the decl cheaper than the alloca). But that might >> possibly be the better choice. >> >> Any other thoughts? > > How about the approach in this (untested) patch? Using the DECL_ALIGN of the > vla > for the new array prevents stack realignment for folded vla-allocas, also for > large vlas. > > This will not help in vectorizing large folded vla-allocas, but I think it's > not > reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has > been the case up until we started to fold). If you want to trigger > vectorization > for a vla, you can still use the aligned attribute on the declaration. > > Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without > using > an attribute on the decl. This patch exploits this by setting it at the end of > the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in > propagation though, because although the ptr_info of the lhs is propagated via > copy_prop afterwards, it's not propagated anymore via ccp. > > Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 > and > not fold during ccp3.
Ugh, somehow I like this the least ;) How about lowering VLAs to p = __builtin_alloca (...); p = __builtin_assume_aligned (p, DECL_ALIGN (vla)); and not assume anything for alloca itself if it feeds a __builtin_assume_aligned? Or rather introduce a __builtin_alloca_with_align () and for VLAs do p = __builtin_alloca_with_align (..., DECL_ALIGN (vla)); that's less awkward to use? Sorry for not having a clear plan here ;) Richard. > Thanks, > - Tom > >