On Tue, 30 Jan 2018, Richard Biener wrote:

> 
> This patch tries to deal with the "easy" part of a function ABI,
> the return value location, in vectorization costing.  The testcase
> shows that if we vectorize the returned value but the function
> doesn't return in memory or in a vector register but as in this
> case in an integer register pair (reg:TI ax) (bah, ABI details
> exposed late?  why's this not a parallel?) we end up spilling
> badly.
> 
> The idea is to account for such spilling so if vectorization
> benefits outweight the spilling we'll vectorize anyway.
> 
> I think the particular testcase could be fixed in the subreg
> pass basically undoing the vectorization but I realize that
> generally this is a too hard problem and avoiding vectorization
> is better.  Still this patch is somewhat fragile in that it
> depends on us "seeing" that the stored to decl is returned
> (see cfun_returns).
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> I'd like to hear opinions on my use of hard_function_value
> and also from other target maintainers.  I'm not sure we
> have sufficient testsuite coverage of _profitable_ vectorization
> of a return value.  Feel free to add to this for your
> target.
> 
> Ok for trunk?

Ok, looks like hard_function_value doesn't like being called
with

    type <record_type 0x7ffff65009d8 optional asm_written 
needs-constructing type_1 type_5 type_6 BLK
        size <integer_cst 0x7ffff64ee558 constant 8008>

we then ICE:

#0  fancy_abort (
    file=0x2094ef0 
"/space/rguenther/src/svn/early-lto-debug/gcc/machmode.h", 
    line=292, 
    function=0x2095534 <opt_mode<scalar_int_mode>::require() 
const::__FUNCTION__> "require") at 
/space/rguenther/src/svn/early-lto-debug/gcc/diagnostic.c:1500
#1  0x0000000000c34790 in opt_mode<scalar_int_mode>::require (
    this=0x7fffffffcf20)
    at /space/rguenther/src/svn/early-lto-debug/gcc/machmode.h:292
#2  0x0000000000db271a in hard_function_value (valtype=0x7ffff65009d8, 
    func=0x7ffff64fff00, fntype=0x0, outgoing=1)
    at /space/rguenther/src/svn/early-lto-debug/gcc/explow.c:2212

2201          /* int_size_in_bytes can return -1.  We don't need a check 
here
2202             since the value of bytes will then be large enough that 
no
2203             mode will match anyway.  */
2204
2205          FOR_EACH_MODE_IN_CLASS (tmpmode, MODE_INT)
2206            {
(gdb) l
2207              /* Have we found a large enough mode?  */
2208              if (GET_MODE_SIZE (tmpmode.require ()) >= bytes)
2209                break;
2210            }
2211
2212          PUT_MODE (val, tmpmode.require ());

which means we didn't find an integer mode of the wanted size.
The backend simply returned (reg:BLK 0 ax).  I suppose I have
to guard hard_function_value with sth ...

Richard.

> Thanks,
> Richard.
> 
> 2018-01-30  Richard Biener  <rguent...@suse.de>
> 
>       PR tree-optimization/84101
>       * tree-vect-stmts.c: Include explow.h for hard_function_value.
>       (cfun_returns): New helper.
>       (vect_model_store_cost): When vectorizing a store to a decl
>       we return and the function ABI returns via a non-vector
>       register account for the possible spilling that will happen.
> 
>       * gcc.target/i386/pr84101.c: New testcase.
> 
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c     (revision 257139)
> +++ gcc/tree-vect-stmts.c     (working copy)
> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-cfg.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "cfgloop.h"
> +#include "explow.h"
>  #include "tree-ssa-loop.h"
>  #include "tree-scalar-evolution.h"
>  #include "tree-vectorizer.h"
> @@ -893,6 +894,22 @@ vect_model_promotion_demotion_cost (stmt
>                       "prologue_cost = %d .\n", inside_cost, prologue_cost);
>  }
>  
> +/* Returns true if the current function returns DECL.  */
> +
> +static bool
> +cfun_returns (tree decl)
> +{
> +  edge_iterator ei;
> +  edge e;
> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +    {
> +      greturn *ret = safe_dyn_cast <greturn *> (last_stmt (e->src));
> +      if (ret && gimple_return_retval (ret) == decl)
> +     return true;
> +    }
> +  return false;
> +}
> +
>  /* Function vect_model_store_cost
>  
>     Models cost for stores.  In the case of grouped accesses, one access
> @@ -971,6 +988,36 @@ vect_model_store_cost (stmt_vec_info stm
>                                      vec_to_scalar, stmt_info, 0, vect_body);
>      }
>  
> +  /* When vectorizing an SLP store into the function result assign
> +     a penalty if the function returns in a non-vector register.
> +     In this case we assume we'll end up with having to spill the
> +     vector result and do element loads.  */
> +  if (slp_node)
> +    {
> +      tree base = get_base_address (dr->ref);
> +      if (base
> +       && (TREE_CODE (base) == RESULT_DECL
> +           || (DECL_P (base) && cfun_returns (base))))
> +     {
> +       rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1);
> +       if ((REG_P (reg) && ! VECTOR_MODE_P (GET_MODE (reg)))
> +           || GET_CODE (reg) == PARALLEL
> +           || GET_CODE (reg) == CONCAT)
> +         {
> +           /* Spill.  */
> +           inside_cost += record_stmt_cost (body_cost_vec,
> +                                            ncopies, vector_store,
> +                                            stmt_info, 0, vect_body);
> +           /* Element loads.  */
> +           unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
> +           inside_cost += record_stmt_cost (body_cost_vec,
> +                                            ncopies * assumed_nunits,
> +                                            scalar_load,
> +                                            stmt_info, 0, vect_body);
> +         }
> +     }
> +    }
> +
>    if (dump_enabled_p ())
>      dump_printf_loc (MSG_NOTE, vect_location,
>                       "vect_model_store_cost: inside_cost = %d, "
> Index: gcc/testsuite/gcc.target/i386/pr84101.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/pr84101.c   (nonexistent)
> +++ gcc/testsuite/gcc.target/i386/pr84101.c   (working copy)
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-slp2-details" } */
> +
> +typedef struct uint64_pair uint64_pair_t ;
> +struct uint64_pair
> +{
> +  unsigned long w0 ;
> +  unsigned long w1 ;
> +} ;
> +
> +uint64_pair_t pair(int num)
> +{
> +  uint64_pair_t p ;
> +
> +  p.w0 = num << 1 ;
> +  p.w1 = num >> 1 ;
> +
> +  return p ;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp2" } } */
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to