On Thu, 16 May 2019, Jakub Jelinek wrote: > On Thu, May 16, 2019 at 09:53:06AM +0200, Richard Biener wrote: > > > + if (nonconst_simd_if) > > > + { > > > + if (sctx.lane == NULL_TREE) > > > + { > > > + sctx.idx = create_tmp_var (unsigned_type_node); > > > + sctx.lane = create_tmp_var (unsigned_type_node); > > > + } > > > > Does forcing a SIMD_LANE possibly pessimize things? Just looking > > whether a separate IFN might be better here or even doing the > > versioning in omp expansion/lowering? But see question below... > > If the GOMP_SIMD_LANE isn't used for anything else, then it is a matter > of having the .GOMP_SIMD_LANE NOVOPS call inside of the loop. I don't think > it pessimizes anything compared to having some other IFN in the loop that > does it; I've thought about doing the versioning in omp expansion, but then > we treat those two loops as unrelated loops, would optimize them differently > and wouldn't be able to share the scalar loop that might be needed for > some kind of other versioning in the vectorizer with that other scalar loop. > > > > case IFN_GOMP_SIMD_LANE: > > > + if (gimple_call_num_args (stmt) >= 2 > > > + && !integer_nonzerop (gimple_call_arg (stmt, 1))) > > > + break; > > > + /* FALLTHRU */ > > > > GOMP_SIMD_LANE has ECF_NOVOPS, how do you prevent code motion of it > > if the result is unused? Even with VOPs the vectorized loop may be > > a reduction not involving any memory operations. > > First of all, at least in OpenMP standard, both the simdlen and if clauses > determine the preferred number of concurrent simd lanes, which doesn't seem > like a language that forbids anything else, just that it would be nice to > honor what the user asked for if possible. If the user asks for some > nonsensical simdlen, we aren't going to support that anyway. > With ECF_NOVOPS, rather than ECF_PURE or ECF_CONST, the function has side > effects unknown to the compiler, so not sure how we could move it outside of > the loop, and the code doesn't care where exactly in the loop it is. > > > > + if (loop->simduid) > > > + { > > > + gimple *g = gsi_stmt (si); > > > + /* If .GOMP_SIMD_LANE call for the current loop has 2 arguments, > > > + the second argument is the #pragma omp simd if (x) condition, > > > + when 0, loop shouldn't be vectorized, when non-zero constant, > > > + it should be vectorized normally, otherwise versioned with > > > + vectorized loop done if the condition is non-zero at > > > + runtime. */ > > > + if (is_gimple_call (g) > > > + && gimple_call_internal_p (g) > > > + && gimple_call_internal_fn (g) == IFN_GOMP_SIMD_LANE > > > + && gimple_call_num_args (g) >= 2 > > > + && TREE_CODE (gimple_call_arg (g, 0)) == SSA_NAME > > > + && (loop->simduid > > > + == SSA_NAME_VAR (gimple_call_arg (g, 0)))) > > > + { > > > + tree arg = gimple_call_arg (g, 1); > > > + if (integer_zerop (arg)) > > > + return opt_result::failure_at (g, > > > + "not vectorized: " > > > + "simd if(0)\n"); > > > + if (TREE_CODE (arg) == SSA_NAME) > > > + LOOP_VINFO_SIMD_IF_COND (loop_vinfo) = arg; > > > + } > > > + } > > > > This looks like a quite arbitrary place to do this, it wastes > > compile-time in case of a zero arg. Can't you do this in > > I was looking for some spot where we walk over the statements in the loop > already, rather than adding yet another IL walk. Sure, it could be guarded > with loop->simduid and so not be done unless it is a loop with > .GOMP_SIMD_LANE call. > I'd think the zero case will be extremely rare there, we handle literal > if (0) much earlier, and I think it will be fairly unusual if we propagate > a zero into the condition otherwise. > > > vect_analyze_loop before the loop over vector sizes? Maybe > > re-using what note_simd_array_uses computes? > > note_simd_array_uses indeed does walk the IL and does look at the calls, > but I'd need some data structure where to store the argument; we don't have > loop_vinfo yet (we don't have it even before the loop over vector sizes), > adding another tree to struct loop seems undesirable from memory usage POV, > we'd need it just for the duration between note_simd_array_uses and > the actual loop_vinfo creation; so would you prefer some extra hash_map for > that?
Maybe that or move it to the _loop_vec_info constructor which also walks over the loop body for setting UIDs and creating stmt infos? > > Also I wonder what happens if arg is not SSA name - I guess > > you omitted the > > > > else > > gcc_assert (integer_onep (arg)); > > > > ? Otherwise just dropping the condition looks wrong. > > Maybe integer_nonzerop (arg) (even though the gimplifier runs > gimple_boolify on the argument). Not really sure if some non-INTEGER_CST > invariant could make it through if it has been boolified. > > > Is a zero argument really a "force no vectorization" or should > > it merely reset force_vectorize to zero? > > It states that the user prefers to not vectorize that loop, it isn't a hard > requirement (and it is hard to figure it out), but I think we should honor > that request if possible and just not vectorize. > > > > + if (version_simd_if_cond) > > > + { > > > + gcc_assert (TREE_CODE (version_simd_if_cond) == SSA_NAME); > > > + gcc_assert (dom_info_available_p (CDI_DOMINATORS)); > > > + if (basic_block bb > > > + = gimple_bb (SSA_NAME_DEF_STMT (version_simd_if_cond))) > > > + { > > > + if (!dominated_by_p (CDI_DOMINATORS, loop->header, bb) > > > + || (scalar_loop > > > + && !dominated_by_p (CDI_DOMINATORS, scalar_loop->header, > > > + bb))) > > > + version_simd_if_cond = boolean_false_node; > > > > How can this ever happen? A loop has a single entry and I hope > > omp lowering places the condition computation on that entry. > > Yes, the gimplification of the condition is before the loop header. > I was just trying to be safe, because the .GOMP_SIMD_LANE call is inside of > the loop, not before the loop and some optimization might in theory move the > def stmt of the argument SSA_NAME from before the loop to inside the loop > (not aware of any that does it). Sure, in that case we wouldn't honor what > user preferred. OK, I see. Indeed in theory something could sink the def which I'd call a bug - so maybe a gcc_checking_assert that this doesn't happen would be nice. > > dominators are available since loops are up-to-date so the DOM > > assertion is not necessary. Likewise the SSA_NAME assert > > is covered by tree checking on the SSA_NAME_DEF_STMT. > > Ok, will take the == SSA_NAME assert out. > > > > > + } > > > + tree zero = build_int_cst (TREE_TYPE (version_simd_if_cond), 0); > > > > build_zero_cst (TREE_TYPE (version_simd_if_cond)) > > Is that better (build_zero_cst is a wrapper that will call build_int_cst > with 0)? A lot of code calls build_int_cst directly. Don't care much > though. it's just shorter... ;) Richard.