On Wed, Oct 5, 2011 at 11:07 PM, Tom de Vries <tom_devr...@mentor.com> wrote: > On 10/05/2011 10:46 AM, Richard Guenther wrote: >> On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries <tom_devr...@mentor.com> wrote: >>> On 10/04/2011 03:03 PM, Richard Guenther wrote: >>>> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <tom_devr...@mentor.com> >>>> wrote: >>>>> On 10/01/2011 05:46 PM, Tom de Vries wrote: >>>>>> On 09/30/2011 03:29 PM, Richard Guenther wrote: >>>>>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <tom_devr...@mentor.com> >>>>>>> wrote: >>>>>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote: >>>>>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries >>>>>>>>> <tom_devr...@mentor.com> wrote: >>>>>>>>>> Richard, >>>>>>>>>> >>>>>>>>>> I got a patch for PR50527. >>>>>>>>>> >>>>>>>>>> The patch prevents the alignment of vla-related allocas to be set to >>>>>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after >>>>>>>>>> folding >>>>>>>>>> the alloca. >>>>>>>>>> >>>>>>>>>> Bootstrapped and regtested on x86_64. >>>>>>>>>> >>>>>>>>>> OK for trunk? >>>>>>>>> >>>>>>>>> Hmm. As gfortran with -fstack-arrays uses VLAs it's probably bad that >>>>>>>>> the vectorizer then will no longer see that the arrays are properly >>>>>>>>> aligned. >>>>>>>>> >>>>>>>>> I'm not sure what the best thing to do is here, other than trying to >>>>>>>>> record >>>>>>>>> the alignment requirement of the VLA somewhere. >>>>>>>>> >>>>>>>>> Forcing the alignment of the alloca replacement decl to >>>>>>>>> BIGGEST_ALIGNMENT >>>>>>>>> has the issue that it will force stack-realignment which isn't free >>>>>>>>> (and the >>>>>>>>> point was to make the decl cheaper than the alloca). But that might >>>>>>>>> possibly be the better choice. >>>>>>>>> >>>>>>>>> Any other thoughts? >>>>>>>> >>>>>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN >>>>>>>> of the vla >>>>>>>> for the new array prevents stack realignment for folded vla-allocas, >>>>>>>> also for >>>>>>>> large vlas. >>>>>>>> >>>>>>>> This will not help in vectorizing large folded vla-allocas, but I >>>>>>>> think it's not >>>>>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although >>>>>>>> that has >>>>>>>> been the case up until we started to fold). If you want to trigger >>>>>>>> vectorization >>>>>>>> for a vla, you can still use the aligned attribute on the declaration. >>>>>>>> >>>>>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also >>>>>>>> without using >>>>>>>> an attribute on the decl. This patch exploits this by setting it at >>>>>>>> the end of >>>>>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective >>>>>>>> in >>>>>>>> propagation though, because although the ptr_info of the lhs is >>>>>>>> propagated via >>>>>>>> copy_prop afterwards, it's not propagated anymore via ccp. >>>>>>>> >>>>>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of >>>>>>>> ccp2 and >>>>>>>> not fold during ccp3. >>>>>>> >>>>>>> Ugh, somehow I like this the least ;) >>>>>>> >>>>>>> How about lowering VLAs to >>>>>>> >>>>>>> p = __builtin_alloca (...); >>>>>>> p = __builtin_assume_aligned (p, DECL_ALIGN (vla)); >>>>>>> >>>>>>> and not assume anything for alloca itself if it feeds a >>>>>>> __builtin_assume_aligned? >>>>>>> >>>>>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do >>>>>>> >>>>>>> p = __builtin_alloca_with_align (..., DECL_ALIGN (vla)); >>>>>>> >>>>>>> that's less awkward to use? >>>>>>> >>>>>>> Sorry for not having a clear plan here ;) >>>>>>> >>>>>> >>>>>> Using assume_aligned is a more orthogonal way to represent this in >>>>>> gimple, but >>>>>> indeed harder to use. >>>>>> >>>>>> Another possibility is to add a 'tree vla_decl' field to struct >>>>>> gimple_statement_call, which is probably the easiest to implement. >>>>>> >>>>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so >>>>>> I >>>>>> decided to try this one. Attached patch implements my first stab at this >>>>>> (now >>>>>> testing on x86_64). >>>>>> >>>>>> Is this an acceptable approach? >>>>>> >>>>> >>>>> bootstrapped and reg-tested (including ada) on x86_64. >>>>> >>>>> Ok for trunk? >>>> >>>> The idea is ok I think. But >>>> >>>> case BUILT_IN_ALLOCA: >>>> + case BUILT_IN_ALLOCA_WITH_ALIGN: >>>> /* If the allocation stems from the declaration of a variable-sized >>>> object, it cannot accumulate. */ >>>> target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp)); >>>> if (target) >>>> return target; >>>> + if (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) >>>> + == BUILT_IN_ALLOCA_WITH_ALIGN) >>>> + { >>>> + tree new_call = build_call_expr_loc (EXPR_LOCATION (exp), >>>> + >>>> built_in_decls[BUILT_IN_ALLOCA], >>>> + 1, CALL_EXPR_ARG (exp, 0)); >>>> + CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp); >>>> + exp = new_call; >>>> + } >>>> >>>> Ick. Why can't the rest of the compiler not handle >>>> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA? >>>> (thus, arrange things so the assembler name of alloca-with-align is >>>> alloca?) >>>> >>> >>> We can set the assembler name in the local_define_builtin call. But that >>> will >>> still create a call alloca (12, 4). How do we deal with the second argument? >>> >>>> I don't see why you still need the special late CCP pass. >>>> >>> >>> For alloca_with_align, the align will minimally be the 2nd argument. This is >>> independent of folding, and we can propagate this information in every ccp. >>> >>> If the alloca_with_align is not folded and will not be folded anymore >>> (something >>> we know at the earliest after the propagation phase of the last ccp), >>> the alignment of BIGGEST_ALIGNMENT is guaranteed, because we expand it like >>> a >>> normal allloca. We try to exploit this by improving the ptr_info->align. >> >> I'd just assume the (lower) constant alignment of the argument always. >> >>> Well, that was the idea. But now I wonder, isn't it better to do this in >>> expand_builtin_alloca: >>> ... >>> /* Compute the argument. */ >>> op0 = expand_normal (CALL_EXPR_ARG (exp, 0)); >>> >>> + align = >>> + (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) == >>> + BUILT_IN_ALLOCA_WITH_ALIGN) >>> + ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1)) >>> + : BIGGEST_ALIGNMENT; >>> + >>> + >>> /* Allocate the desired space. */ >>> - result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT, >>> - cannot_accumulate); >>> + result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate); >>> result = convert_memory_address (ptr_mode, result); >> >> Yes, that's probably a good idea anyway - it will reduce excess >> stack re-alignment for VLAs. >> >>> return result; >>> >>> ... >>> >>> This way, we lower the alignment requirement for vlas in general, not only >>> when >>> folding (unless we expand to an actual alloca call). >>> >>> If we do that, we don't know whether BIGGEST_ALIGNMENT is going to be >>> applicable >>> until expand, so we can't exploit that in the middle-end, so we lose the >>> need >>> for ccp_last. >> >> Good ;) >> > > Handled this way now. > >> Thanks, >> Richard. >> >>>> -static tree >>>> -fold_builtin_alloca_for_var (gimple stmt) >>>> +static bool >>>> +builtin_alloca_with_align_p (gimple stmt) >>>> { >>>> - unsigned HOST_WIDE_INT size, threshold, n_elem; >>>> - tree lhs, arg, block, var, elem_type, array_type; >>>> - unsigned int align; >>>> + tree fndecl; >>>> + if (!is_gimple_call (stmt)) >>>> + return false; >>>> + >>>> + fndecl = gimple_call_fndecl (stmt); >>>> + >>>> + return (fndecl != NULL_TREE >>>> + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN); >>>> +} >>>> >>>> equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN). >>>> >>>> + DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); >>>> >>>> Now, users might call __builtin_alloca_with_align (8, align), thus use >>>> a non-constant alignment argument. You either need to reject this >>>> in c-family/c-common.c:check_builtin_function_arguments with an >>>> error or fall back to BIGGEST_ALIGNMENT (similarly during propagation >>>> I suppose). >>>> >>>> +DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align") >>>> >>>> Oh, I see you are not exposing this to users, so ignore the comments >>>> about non-constant align. > >>>> Can you move this >>>> DEF down, near the other DEF_BUILTIN_STUBs and add a comment >>>> that this is used for VLAs? > > Done. > >>>> >>>> + build_int_cstu (size_type_node, >>>> + DECL_ALIGN (parm))); >>>> >>>> size_int (DECL_ALIGN (parm)), similar in other places. >>>> > > Done. > >>>> Let's see if there are comments from others on this - which I like. >>>> >>> >>> Thanks, >>> - Tom >>> >>>> Thanks, >>>> Richard. >>>> >>>>> Thanks, >>>>> - Tom >>>>> >>> >>> > > Also, the alloca_with_align (a,b) now appears in assembly as alloca (a, b), as > discussed in http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00307.html. > > Bootstrapped and regtested (including ada) on x86_64. > > OK for trunk?
Ok. Thanks, Richard. > Thanks, > - Tom > > 2011-10-05 Tom de Vries <t...@codesourcery.com> > > PR middle-end/50527 > * tree.c (build_common_builtin_nodes): Add local_define_builtin for > BUILT_IN_ALLOCA_WITH_ALIGN. Mark that BUILT_IN_ALLOCA_WITH_ALIGN can > throw. > * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN > arglist. Set align for BUILT_IN_ALLOCA_WITH_ALIGN. > (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN. > (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN. > * tree-ssa-ccp.c (evaluate_stmt): Set align for > BUILT_IN_ALLOCA_WITH_ALIGN. > (fold_builtin_alloca_for_var): Rename to ... > (fold_builtin_alloca_with_align): Set DECL_ALIGN from 2nd > BUILT_IN_ALLOCA_WITH_ALIGN argument. > (ccp_fold_stmt): Try folding BUILT_IN_ALLOCA_WITH_ALIGN using > fold_builtin_alloca_with_align. > (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN. > * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using > DEF_BUILTIN_STUB. > * ipa-pure-const.c (special_builtin_state): Handle > BUILT_IN_ALLOCA_WITH_ALIGN. > * tree-ssa-alias.c (ref_maybe_used_by_call_p_1) > (call_may_clobber_ref_p_1): Same. > * function.c (gimplify_parameters): Lower vla to > BUILT_IN_ALLOCA_WITH_ALIGN. > * gimplify.c (gimplify_vla_decl): Same. > * cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN. > * tree-mudflap.c (mf_xform_statements): Same. > * tree-ssa-dce.c (mark_stmt_if_obviously_necessary) > (mark_all_reaching_defs_necessary_1, propagate_necessity): Same. > * varasm.c (incorporeal_function_p): Same. > * tree-object-size.c (alloc_object_size): Same. > * gimple.c (gimple_build_call_from_tree): Same. > > * gcc.dg/pr50527.c: New test. >