On Tue, Aug 21, 2018 at 7:49 PM Caio Marcelo de Oliveira Filho < [email protected]> wrote:
> On Sat, Jul 28, 2018 at 10:44:41PM -0700, Jason Ekstrand wrote: > > This peephole optimization looks for a series of load/store_deref or > > copy_deref instructions that copy an array from one variable to another > > and turns it into a copy_deref that copies the entire array. The > > pattern it looks for is extremely specific but it's good enough to pick > > up on the input array copies in DXVK and should also be able to pick up > > the sequence generated by spirv_to_nir for a OpLoad of a large composite > > followed by OpStore. It can always be improved later if needed. > > I've liked this one :-) > > I'd add a comment to the file with a subset of this description. In > particular I'd consider mentioning that the pass expects the > loads/stores/copies to happen in order and also any other > stores/copies to different variables will spoil the detecting. > > There are a few minor comments and a comment about store write mask. > Assuming those are resolved this patch is: > > Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> > I've applied your suggestions. Please double-check: https://gitlab.freedesktop.org/jekstrand/mesa/commit/28c01e9a0ce84ff53fa8805e4a35a691ea3fc744 > Note: the "deref_map" (well, an evolution of it) from the dead write > elimination series, could be used to keep track of multiple of such > copies, with less restrictions. That said, seems for the case we > about it is not necessary. > > (...) > > > +static bool > > +opt_find_array_copies_block(nir_builder *b, nir_block *block, > > + unsigned *num_ssa_defs, void *mem_ctx) > > +{ > > + bool progress = false; > > + > > + struct match_state s; > > + match_state_init(&s); > > + > > + nir_variable *dst_var = NULL; > > + unsigned prev_dst_var_last_write = *num_ssa_defs; > > + unsigned dst_var_last_write = *num_ssa_defs; > > + > > + nir_foreach_instr(instr, block) { > > + /* Index the SSA defs before we do anything else. */ > > + nir_foreach_ssa_def(instr, index_ssa_def_cb, num_ssa_defs); > > + > > + if (instr->type != nir_instr_type_intrinsic) > > + continue; > > + > > + nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr); > > + if (intrin->intrinsic != nir_intrinsic_copy_deref && > > + intrin->intrinsic != nir_intrinsic_store_deref) > > + continue; > > + > > + nir_deref_instr *dst_deref = nir_src_as_deref(intrin->src[0]); > > Given it is cheap enough, would make sense to early return if > dst_deref is a var, at least before we make it an active copy? > Similar case for src_deref. The code is correct (try_match_deref will > fail for variable since it can find an index), so let this up to you. > I think we probably could. I'll add something. > > + > > + /* The destination must be local. If we see a non-local store, we > > + * continue on because it won't affect local stores or read-only > > + * variables. > > + */ > > + if (dst_deref->mode != nir_var_local) > > + continue; > > For correctness, should we reset if this is not a full write mask > store? > Yes, we should. I'm not sure if that can happen in practice right now but we should put it in there. > > + /* We keep track of the SSA indicies where the two last-written > > Typo here and elsewhere: "indices". > Fixed. Thanks! > > + * variables are written. The prev_dst_var_last_write tells us > when > > + * the last store_deref to something other than dst happened. If > the > > + * SSA def index from a load is greater than or equal to this > number > > + * then we know it happened afterwards and no writes to anything > other > > + * than dst occur between the load and the current instruction. > > + */ > > + if (nir_deref_instr_get_variable(dst_deref) != dst_var) { > > + prev_dst_var_last_write = dst_var_last_write; > > + dst_var = nir_deref_instr_get_variable(dst_deref); > > + } > > + dst_var_last_write = *num_ssa_defs; > > + > > + nir_deref_instr *src_deref; > > + if (intrin->intrinsic == nir_intrinsic_copy_deref) { > > + src_deref = nir_src_as_deref(intrin->src[1]); > > + } else { > > + assert(intrin->intrinsic == nir_intrinsic_store_deref); > > + src_deref = get_deref_for_load_src(intrin->src[1], > > + prev_dst_var_last_write); > > + } > > + > > + /* If we didn't find a valid src, then we have an unknown store > and it > > + * could mess things up. > > + */ > > + if (src_deref == NULL) > > + goto fail; > > + > > + /* The destination must be local and the source must be either > local or > > + * something that's guaranteed to be read-only. > > + */ > > + const nir_variable_mode read_only_modes = > > + nir_var_shader_in | nir_var_uniform | nir_var_system_value; > > + if (!(src_deref->mode & (nir_var_local | read_only_modes))) > > + goto fail; > > + > > + /* If we don't yet have an active copy, then make this > instruction the > > + * active copy. > > + */ > > A note explaining that in the first store we still don't know which > array in the path is being copied would have helped me when reading > this. > Done. > > + if (s.next_array_idx == 0) { > > + /* We can't combine a copy if there is any chance the source > and > > + * destination will end up aliasing. Just bail if they're the > same > > + * variable. > > + */ > > + if (nir_deref_instr_get_variable(dst_deref) == > > + nir_deref_instr_get_variable(src_deref)) > > Could reuse dst_var instead of recalculate. > Sure. > > + goto fail; > > + > > + /* The load/store pair is enough to guarantee the same bit > size and > > + * number of components but a copy_var requires the actual > types to > > + * match. > > + */ > > + if (dst_deref->type != src_deref->type) > > + continue; > > + > > + nir_deref_path_init(&s.first_dst_path, dst_deref, mem_ctx); > > + nir_deref_path_init(&s.first_src_path, src_deref, mem_ctx); > > + s.next_array_idx = 1; > > + continue; > > + } > > + > > + if (!try_match_deref(&s.first_dst_path, &s.dst_deref_array_idx, > > + dst_deref, s.next_array_idx, mem_ctx) || > > + !try_match_deref(&s.first_src_path, &s.src_deref_array_idx, > > + src_deref, s.next_array_idx, mem_ctx)) > > + goto fail; > > + > > + if (s.next_array_idx == 1) { > > + /* This is our first non-trivial match. We now have indicies > into > > + * the search paths so we can do a couple more checks. > > + */ > > + assert(s.dst_deref_array_idx > 0 && s.src_deref_array_idx > 0); > > + const struct glsl_type *dst_arr_type = > > + s.first_dst_path.path[s.dst_deref_array_idx - 1]->type; > > + const struct glsl_type *src_arr_type = > > + s.first_src_path.path[s.src_deref_array_idx - 1]->type; > > + > > + assert(glsl_type_is_array(dst_arr_type) || > > + glsl_type_is_matrix(dst_arr_type)); > > + assert(glsl_type_is_array(src_arr_type) || > > + glsl_type_is_matrix(src_arr_type)); > > + > > + /* They must be the same length */ > > + s.array_len = glsl_get_length(dst_arr_type); > > + if (s.array_len != glsl_get_length(src_arr_type)) > > + goto fail; > > + } > > + > > + s.next_array_idx++; > > + > > + if (s.next_array_idx == s.array_len) { > > + /* Hooray, We found a copy! */ > > + b->cursor = nir_after_instr(instr); > > + nir_copy_deref(b, build_wildcard_deref(b, &s.first_dst_path, > > + s.dst_deref_array_idx), > > + build_wildcard_deref(b, &s.first_src_path, > > + s.src_deref_array_idx)); > > + match_state_reset(&s); > > + progress = true; > > + } > > + > > + continue; > > + > > + fail: > > + match_state_reset(&s); > > + } > > Maybe call this label "reset:" instead of "fail:". > Sure. That's reasonable. --Jason
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
