Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi,
>
> I have been playing around with making Kewen's partial vector changes 
> workable with s390:
>
> We have a vll instruction that can be passed the highest byte to load. 
> The rather unfortunate consequence of this is that a length of zero 
> cannot be specified.  The partial vector framework, however, relies a 
> lot on the fact that a len_load can be made a NOP using a length of zero.
>
> After confirming an additional zero-check before each vll is definitely 
> too slow across SPEC and some discussion with Kewen we figured the 
> easiest way forward is to exclude loops with multiple VFs (despite 
> giving up vectorization possibilities).  These are prone to len_loads 
> with zero while the regular induction variable check prevents them in 
> single-VF loops.
>
> So, as a quick hack, I went with
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 75f24e7c4f6..f79222daeb6 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -1170,6 +1170,9 @@ vect_verify_loop_lens (loop_vec_info loop_vinfo)
>     if (LOOP_VINFO_LENS (loop_vinfo).is_empty ())
>       return false;
>
> +  if (LOOP_VINFO_LENS (loop_vinfo).length () > 1)
> +    return false;
> +

Yeah, I think this should be sufficient.

> which could be made a hook, eventually.  FWIW this is sufficient to make 
> bootstrap, regtest and compiling the SPEC suites succeed.  I'm unsure 
> whether we are guaranteed not to emit len_load with zero now.   On top, 
> I subtract 1 from the passed length in the expander, which, supposedly, 
> is also not ideal.

Exposing the subtraction in gimple would certainly allow for
more optimisation.

We already have code to probe the predicates of the underlying
define_expands/insns to see whether they support certain constant
IFN arguments; see e.g. internal_gather_scatter_fn_supported_p.
We could do something similar here: add an extra operand to the optab,
and an extra argument to the IFN, that gives a bias amount.
The PowerPC version would require 0, the System Z version would
require -1.  The vectoriser would probe to see which value
it should use.

Doing it that way ensures that the gimple is still self-describing.
It avoids gimple semantics depending on target hooks.

> There are some regressions that I haven't fully analyzed yet but whether 
> and when to actually enable this feature could be a backend decision 
> with the necessary middle-end checks already in place.
>
> Any ideas on how to properly check for the zero condition and exclude 
> the cases that cause it? Kewen suggested enriching the len_load optabs 
> with a separate parameter.

Yeah, I think that'd be a good approach.  A bias of -1 would indicate
that the target can't cope with zero lengths.

Thanks,
Richard

Reply via email to