Re: VEC_COND_EXPR optimizations v2

Richard Biener via Gcc-patches Thu, 06 Aug 2020 23:39:28 -0700

On Thu, Aug 6, 2020 at 8:07 PM Marc Glisse <marc.gli...@inria.fr> wrote:
>
> On Thu, 6 Aug 2020, Christophe Lyon wrote:
>
> >> Was I on the right track configuring with
> >> --target=arm-none-linux-gnueabihf --with-cpu=cortex-a9
> >> --with-fpu=neon-fp16
> >> then compiling without any special option?
> >
> > Maybe you also need --with-float=hard, I don't remember if it's
> > implied by the 'hf' target suffix
>
> Thanks! That's what I was missing to reproduce the issue. Now I can
> reproduce it with just
>
> typedef unsigned int vec __attribute__((vector_size(16)));
> typedef int vi __attribute__((vector_size(16)));
> vi f(vec a,vec b){
>      return a==5 | b==7;
> }
>
> with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 
> -fdisable-tree-forwprop3 -O1
>
>    _1 = a_5(D) == { 5, 5, 5, 5 };
>    _3 = b_6(D) == { 7, 7, 7, 7 };
>    _9 = _1 | _3;
>    _7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 107);
>
> we fail to expand the equality comparison (expand_vec_cmp_expr_p returns
> false), while with -fdisable-tree-forwprop4 we do manage to expand
>
>    _2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 
> 112);
>
> It doesn't make much sense to me that we can expand the more complicated
> form and not the simpler form of the same operation (both compare a to 5
> and produce a vector of -1 or 0 of the same size), especially when the
> target has an instruction (vceq) that does just what we want.
>
> Introducing boolean vectors was fine, but I think they should be real
> types, that we can operate on, not be forced to appear only as the first
> argument of a vcond.
>
> I can think of 2 natural ways to improve things: either implement vector
> comparisons in the ARM backend (possibly by forwarding to their existing
> code for vcond), or in the generic expansion code try using vcond if the
> direct comparison opcode is not provided.
>
> We can temporarily revert my patch, but I would like it to be temporary.
> Since aarch64 seems to handle the same code just fine, maybe someone who
> knows arm could copy the relevant code over?
>
> Does my message make sense, do people have comments?


So what complicates things now (and to some extent pre-existed when you
used AVX512 which _could_ operate on boolean vectors) is that we
have split out the condition from VEC_COND_EXPR to separate stmts
but we do not expect backends to be able to code-generate the separate
form - instead we rely on the ISEL pass to trasform VEC_COND_EXPRs
to .VCOND[U] "merging" the compares again.  Now that process breaks
down once we have things like _9 = _1 | _3;  -  at some point I argued
that we should handle vector compares [and operations on boolean vectors]
as well in ISEL but then when it came up again for some reason I
disregarded that again.

Thus - we don't want to go back to fixing up the generic expansion code
(which looks at one instruction at a time and is restricted by TER single-use
restrictions).  Instead we want to deal with this in ISEL which should
behave more intelligently.  In the above case it might involve turning
the _1 and _3 defs into .VCOND [with different result type], doing
_9 in that type and then somehow dealing with _7 ... but this eventually
means undoing the match simplification that introduced the code?

Not sure if that helps though.

Richard.

> --
> Marc Glisse

Re: VEC_COND_EXPR optimizations v2

Reply via email to