On Thu, 16 May 2019, Jakub Jelinek wrote:

> On Thu, May 16, 2019 at 09:53:06AM +0200, Richard Biener wrote:
> > > +  if (nonconst_simd_if)
> > > +    {
> > > +      if (sctx.lane == NULL_TREE)
> > > + {
> > > +   sctx.idx = create_tmp_var (unsigned_type_node);
> > > +   sctx.lane = create_tmp_var (unsigned_type_node);
> > > + }
> > 
> > Does forcing a SIMD_LANE possibly pessimize things?  Just looking
> > whether a separate IFN might be better here or even doing the
> > versioning in omp expansion/lowering?  But see question below...
> 
> If the GOMP_SIMD_LANE isn't used for anything else, then it is a matter
> of having the .GOMP_SIMD_LANE NOVOPS call inside of the loop.  I don't think
> it pessimizes anything compared to having some other IFN in the loop that
> does it; I've thought about doing the versioning in omp expansion, but then
> we treat those two loops as unrelated loops, would optimize them differently
> and wouldn't be able to share the scalar loop that might be needed for
> some kind of other versioning in the vectorizer with that other scalar loop.
> 
> > >                 case IFN_GOMP_SIMD_LANE:
> > > +                 if (gimple_call_num_args (stmt) >= 2
> > > +                     && !integer_nonzerop (gimple_call_arg (stmt, 1)))
> > > +                   break;
> > > +                 /* FALLTHRU */
> > 
> > GOMP_SIMD_LANE has ECF_NOVOPS, how do you prevent code motion of it
> > if the result is unused?  Even with VOPs the vectorized loop may be
> > a reduction not involving any memory operations.
> 
> First of all, at least in OpenMP standard, both the simdlen and if clauses
> determine the preferred number of concurrent simd lanes, which doesn't seem
> like a language that forbids anything else, just that it would be nice to
> honor what the user asked for if possible.  If the user asks for some
> nonsensical simdlen, we aren't going to support that anyway.
> With ECF_NOVOPS, rather than ECF_PURE or ECF_CONST, the function has side
> effects unknown to the compiler, so not sure how we could move it outside of
> the loop, and the code doesn't care where exactly in the loop it is.
> 
> > > +   if (loop->simduid)
> > > +     {
> > > +       gimple *g = gsi_stmt (si);
> > > +       /* If .GOMP_SIMD_LANE call for the current loop has 2 arguments,
> > > +          the second argument is the #pragma omp simd if (x) condition,
> > > +          when 0, loop shouldn't be vectorized, when non-zero constant,
> > > +          it should be vectorized normally, otherwise versioned with
> > > +          vectorized loop done if the condition is non-zero at
> > > +          runtime.  */
> > > +       if (is_gimple_call (g)
> > > +           && gimple_call_internal_p (g)
> > > +           && gimple_call_internal_fn (g) == IFN_GOMP_SIMD_LANE
> > > +           && gimple_call_num_args (g) >= 2
> > > +           && TREE_CODE (gimple_call_arg (g, 0)) == SSA_NAME
> > > +           && (loop->simduid
> > > +               == SSA_NAME_VAR (gimple_call_arg (g, 0))))
> > > +         {
> > > +           tree arg = gimple_call_arg (g, 1);
> > > +           if (integer_zerop (arg))
> > > +             return opt_result::failure_at (g,
> > > +                                            "not vectorized: "
> > > +                                            "simd if(0)\n");
> > > +           if (TREE_CODE (arg) == SSA_NAME)
> > > +             LOOP_VINFO_SIMD_IF_COND (loop_vinfo) = arg;
> > > +         }
> > > +     }
> > 
> > This looks like a quite arbitrary place to do this, it wastes
> > compile-time in case of a zero arg.  Can't you do this in
> 
> I was looking for some spot where we walk over the statements in the loop
> already, rather than adding yet another IL walk.  Sure, it could be guarded
> with loop->simduid and so not be done unless it is a loop with
> .GOMP_SIMD_LANE call.
> I'd think the zero case will be extremely rare there, we handle literal
> if (0) much earlier, and I think it will be fairly unusual if we propagate
> a zero into the condition otherwise.
> 
> > vect_analyze_loop before the loop over vector sizes?  Maybe
> > re-using what note_simd_array_uses computes?
> 
> note_simd_array_uses indeed does walk the IL and does look at the calls,
> but I'd need some data structure where to store the argument; we don't have
> loop_vinfo yet (we don't have it even before the loop over vector sizes),
> adding another tree to struct loop seems undesirable from memory usage POV,
> we'd need it just for the duration between note_simd_array_uses and
> the actual loop_vinfo creation; so would you prefer some extra hash_map for
> that?

Maybe that or move it to the _loop_vec_info constructor which also
walks over the loop body for setting UIDs and creating stmt infos?

> > Also I wonder what happens if arg is not SSA name - I guess
> > you omitted the
> > 
> >    else
> >      gcc_assert (integer_onep (arg));
> > 
> > ?  Otherwise just dropping the condition looks wrong.
> 
> Maybe integer_nonzerop (arg) (even though the gimplifier runs
> gimple_boolify on the argument).  Not really sure if some non-INTEGER_CST
> invariant could make it through if it has been boolified.
> 
> > Is a zero argument really a "force no vectorization" or should
> > it merely reset force_vectorize to zero?
> 
> It states that the user prefers to not vectorize that loop, it isn't a hard
> requirement (and it is hard to figure it out), but I think we should honor
> that request if possible and just not vectorize.
> 
> > > +  if (version_simd_if_cond)
> > > +    {
> > > +      gcc_assert (TREE_CODE (version_simd_if_cond) == SSA_NAME);
> > > +      gcc_assert (dom_info_available_p (CDI_DOMINATORS));
> > > +      if (basic_block bb
> > > +   = gimple_bb (SSA_NAME_DEF_STMT (version_simd_if_cond)))
> > > + {
> > > +   if (!dominated_by_p (CDI_DOMINATORS, loop->header, bb)
> > > +       || (scalar_loop
> > > +           && !dominated_by_p (CDI_DOMINATORS, scalar_loop->header,
> > > +                               bb)))
> > > +     version_simd_if_cond = boolean_false_node;
> > 
> > How can this ever happen?  A loop has a single entry and I hope
> > omp lowering places the condition computation on that entry.
> 
> Yes, the gimplification of the condition is before the loop header.
> I was just trying to be safe, because the .GOMP_SIMD_LANE call is inside of
> the loop, not before the loop and some optimization might in theory move the
> def stmt of the argument SSA_NAME from before the loop to inside the loop
> (not aware of any that does it).  Sure, in that case we wouldn't honor what
> user preferred.

OK, I see.  Indeed in theory something could sink the def which I'd
call a bug - so maybe a gcc_checking_assert that this doesn't
happen would be nice.

> > dominators are available since loops are up-to-date so the DOM
> > assertion is not necessary.  Likewise the SSA_NAME assert
> > is covered by tree checking on the SSA_NAME_DEF_STMT.
> 
> Ok, will take the == SSA_NAME assert out.
> > 
> > > + }
> > > +      tree zero = build_int_cst (TREE_TYPE (version_simd_if_cond), 0);
> > 
> > build_zero_cst (TREE_TYPE (version_simd_if_cond))
> 
> Is that better (build_zero_cst is a wrapper that will call build_int_cst
> with 0)?  A lot of code calls build_int_cst directly.  Don't care much
> though.

it's just shorter... ;)

Richard.

Reply via email to