Thanks for using maskload/store_optab and sorry for the very slow review.
Been a bit swamped lately...
The patch seems to be using maskload and maskstore as though they were
variants of movMMcc, with the comparison being part of the load/store.
Instead, the current interface is that:
maskload <op0, op1, op2, op3>
would do:
op0 = op2 ? op1 : op3
where op2 is an already-computed condition. Similarly:
maskstore <op0, op1, op2>
would do:
if (op2) op0 = op1
where again op2 is an already-computed condition. (Note that maskstore
only has three operands, not four.)
This would be the natural interface if we were trying to generate these
instructions from gimple. But I suppose the interface makes things awkward
for this patch, where you're trying to generate the pattern from an RTL pass.
So this raises two questions:
(1) How should we handle the requirement to have a comparison operand,
instead of the normal precomputed boolean operand?
(2) maskload and maskstore take two modes: the mode of the data being
loaded/stored, and the mode of the condition. What should the mode
of the condition be if the operand is a comparison?
TBH I'm not sure what to do here. One option would be to emit a separate
cstore (emit_store_flag) and then pass the result of that cstore to operand 2
of the maskload/store. The mode of the operand could be the integer
equivalent of the value being loaded/stored (e.g. SI when loading or
storing SF). I think this would work best with any future gimple
support. But it likely means that we rely on combine to eliminate
redundant comparison-of-cstore sequences.
Another option would be to allow operand 2 to be a comparison operand,
as for movMMcc. Regarding (2), we could choose to use VOIDmode for
the second mode, since (a) that is the actual mode of a canonicalised
comparison and (b) it should safely isolate the comparison usage from
the non-comparison usage. If no-one has any better suggestions,
I suppose we should do this. It isn't mutually exclusive with the
first option: we could still handle precomputed boolean conditions
as well, in a future patch.
"Kong, Lingling" <[email protected]> writes:
> @@ -2132,6 +2134,54 @@ noce_emit_bb (rtx last_insn, basic_block bb, bool
> simple)
> return true;
> }
>
> +/* Return TRUE if we could convert "if (test) *x = a; else skip" to
> + scalar mask store and could do conditional faulting movcc, i.e.
> + x86 cfcmov, especially when store x may cause memmory faults and
> + in else_bb x == b. */
> +
> +static bool
> +can_use_scalar_mask_store (rtx x, rtx a, rtx b, bool a_simple)
> +{
> + gcc_assert (MEM_P (x));
> +
> + machine_mode x_mode = GET_MODE (x);
> + if (convert_optab_handler (maskstore_optab, x_mode,
> + x_mode) == CODE_FOR_nothing)
If we go for the option described above, the second mode here should
be the mode of if_info.cond.
> + return false;
> +
> + if (!rtx_equal_p (x, b) || !may_trap_or_fault_p (x))
> + return false;
> + if (!a_simple || !register_operand (a, x_mode))
> + return false;
Could you explain the purpose of the last if statement? I would have
expected noce_try_cmove_arith to handle other forms of "a" correctly
(as long as they don't fault -- more on that below).
> +
> + return true;
> +}
> +
> +/* Return TRUE if backend supports scalar maskload_optab/maskstore_optab,
> + which suppressed memory faults when load or store a memory operand
> + and the condition code evaluates to false. */
> +
> +static bool
> +can_use_scalar_mask_load_store (struct noce_if_info *if_info)
> +{
> + rtx a = if_info->a;
> + rtx b = if_info->b;
> + rtx x = if_info->x;
> +
> + if (!MEM_P (a) && !MEM_P (b))
> + return false;
> +
> + if (MEM_P (x))
> + return can_use_scalar_mask_store (x, a, b, if_info->then_simple);
> + else
> + /* Return TRUE if backend supports scalar maskload_optab, we could
> convert
> + "if (test) x = *a; else x = b;" or "if (test) x = a; else x = *b;"
> + to conditional faulting movcc, i.e. x86 cfcmov, especially when load a
> + or b may cause memmory faults. */
> + return convert_optab_handler (maskstore_optab, GET_MODE (a),
> + GET_MODE (a)) != CODE_FOR_nothing;
It looks like this should be maskload_optab. "a" might be a VOIDmode
constant, so it's probably better to use GET_MODE (x) for the first mode.
The comment above about the second mode applies here too.
> +}
> +
> /* Try more complex cases involving conditional_move. */
>
> static bool
> @@ -2171,7 +2221,17 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
> /* ??? We could handle this if we knew that a load from A or B could
> not trap or fault. This is also true if we've already loaded
> from the address along the path from ENTRY. */
This comment is now a little out of place.
> - else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> + /* Just wait cse_not_expected, then convert to conditional mov on their
> + addresses followed by a load. */
> + else if (may_trap_or_fault_p (a) && may_trap_or_fault_p (b))
> + return false;
may_trap_or_fault_p (a) and may_trap_or_fault_p (b) are tested multiple
times. Since they're relatively expensive tests, it would be good to
cache them.
> + /* Scalar maskload_optab/maskstore_optab implements conditionally faulting
> + which means that if the condition code evaluates to false, all memory
> + faults are suppressed when load or store a memory operand. Now we could
> + load or store a memory operand may trap or fault for conditional
> + move. */
> + else if ((may_trap_or_fault_p (a) ^ may_trap_or_fault_p (b))
> + && !can_use_scalar_mask_load_store (if_info))
> return false;
I think it'd be more robust to keep the may_trap_or_fault_p and MEM_P
tests together. AIUI there are two cases:
(1) MEM_P (a) && a_may_trap_or_fault && !b_may_trap_or_fault
(2) MEM_P (b) && b_may_tray_or_fault && !a_may_trap_or_fault
The code currently requires !MEM_P (b) for the first case and !MEM_P (a)
for the second case. But I'm not sure that's necessary for correctness.
Is it a costing decision instead?
I think it'd be better to test:
else if ((<case 1> || <case 2>) && can_use_scalar_mask_load_store (if_info))
;
else if (a_may_trap_or_fault || b_may_trap_or_fault)
return false;
>
> /* if (test) x = a + b; else x = c - d;
> @@ -2247,9 +2307,14 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
> /* If either operand is complex, load it into a register first.
> The best way to do this is to copy the original insn. In this
> way we preserve any clobbers etc that the insn may have had.
> - This is of course not possible in the IS_MEM case. */
> + This is of course not possible in the IS_MEM case.
> + For load or store a operands may trap or fault, should not
> + hoist the load or store, otherwise it unable to suppress memory
> + fault, it just a normal arithmetic insn insteads of conditional
> + faulting movcc. */
>
> - if (! general_operand (a, GET_MODE (a)) || tmp_a)
> + if (! may_trap_or_fault_p (a)
> + && (! general_operand (a, GET_MODE (a)) || tmp_a))
> {
>
> if (is_mem)
> @@ -2278,7 +2343,8 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
> }
> }
>
> - if (! general_operand (b, GET_MODE (b)) || tmp_b)
> + if (! may_trap_or_fault_p (b)
> + && (! general_operand (b, GET_MODE (b)) || tmp_b))
> {
> if (is_mem)
> {
> @@ -4210,12 +4276,31 @@ noce_process_if_block (struct noce_if_info *if_info)
> }
>
> if (!set_b && MEM_P (orig_x))
> - /* We want to avoid store speculation to avoid cases like
> - if (pthread_mutex_trylock(mutex))
> - ++global_variable;
> - Rather than go to much effort here, we rely on the SSA optimizers,
> - which do a good enough job these days. */
> - return false;
> + {
> + /* When target support conditional faulting movcc, i.e. x86 cfcmov,
> + we could do conditonal mem store for "if (...) *x = a; else skip"
> + to maskstore_optab, which x may trap or fault. */
> + if ((convert_optab_handler (maskstore_optab, GET_MODE (orig_x),
> + GET_MODE (orig_x)) != CODE_FOR_nothing)
> + && HAVE_conditional_move
> + && may_trap_or_fault_p (orig_x)
> + && register_operand (a, GET_MODE (orig_x)))
> + {
> + x = orig_x;
> + if_info->x = x;
> + if (noce_try_cmove_arith (if_info))
> + goto success;
> + else
> + return false;
> + }
> + /* We want to avoid store speculation to avoid cases like
> + if (pthread_mutex_trylock(mutex))
> + ++global_variable;
> + Rather than go to much effort here, we rely on the SSA optimizers,
> + which do a good enough job these days. */
> + else
> + return false;
> + }
>
> if (noce_try_move (if_info))
> goto success;
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 03ef0c5d81d..524c766d336 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -5085,6 +5085,16 @@ emit_conditional_move (rtx target, struct
> rtx_comparison comp,
>
> icode = direct_optab_handler (movcc_optab, mode);
>
> + if (may_trap_or_fault_p (target) && MEM_P (target)
> + && convert_optab_handler (maskstore_optab, mode,
> + mode) != CODE_FOR_nothing)
> + icode = convert_optab_handler (maskstore_optab, mode, mode);
> + else if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
> + && (MEM_P (op2) || MEM_P (op3))
> + && convert_optab_handler (maskload_optab,
> + mode, mode) != CODE_FOR_nothing)
> + icode = convert_optab_handler (maskload_optab, mode, mode);
> +
> if (icode == CODE_FOR_nothing)
> return NULL_RTX;
>
I think the maskload/store cases should instead be treated as separate
operations, rather than inferred from the operands. The current interface
of emit_conditional_move is that OP2 and OP3 can be evaluated unconditionally.
Thanks,
Richard
> @@ -5217,6 +5227,16 @@ emit_conditional_move_1 (rtx target, rtx comparison,
>
> icode = direct_optab_handler (movcc_optab, mode);
>
> + if (may_trap_or_fault_p (target) && MEM_P (target)
> + && convert_optab_handler (maskstore_optab, mode,
> + mode) != CODE_FOR_nothing)
> + icode = convert_optab_handler (maskstore_optab, mode, mode);
> + else if ((may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
> + && (MEM_P (op2) || MEM_P (op3))
> + && convert_optab_handler (maskload_optab,
> + mode, mode) != CODE_FOR_nothing)
> + icode = convert_optab_handler (maskload_optab, mode, mode);
> +
> if (icode == CODE_FOR_nothing)
> return NULL_RTX;