On Thu, Aug 6, 2020 at 8:07 PM Marc Glisse <marc.gli...@inria.fr> wrote: > > On Thu, 6 Aug 2020, Christophe Lyon wrote: > > >> Was I on the right track configuring with > >> --target=arm-none-linux-gnueabihf --with-cpu=cortex-a9 > >> --with-fpu=neon-fp16 > >> then compiling without any special option? > > > > Maybe you also need --with-float=hard, I don't remember if it's > > implied by the 'hf' target suffix > > Thanks! That's what I was missing to reproduce the issue. Now I can > reproduce it with just > > typedef unsigned int vec __attribute__((vector_size(16))); > typedef int vi __attribute__((vector_size(16))); > vi f(vec a,vec b){ > return a==5 | b==7; > } > > with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 > -fdisable-tree-forwprop3 -O1 > > _1 = a_5(D) == { 5, 5, 5, 5 }; > _3 = b_6(D) == { 7, 7, 7, 7 }; > _9 = _1 | _3; > _7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 107); > > we fail to expand the equality comparison (expand_vec_cmp_expr_p returns > false), while with -fdisable-tree-forwprop4 we do manage to expand > > _2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, > 112); > > It doesn't make much sense to me that we can expand the more complicated > form and not the simpler form of the same operation (both compare a to 5 > and produce a vector of -1 or 0 of the same size), especially when the > target has an instruction (vceq) that does just what we want. > > Introducing boolean vectors was fine, but I think they should be real > types, that we can operate on, not be forced to appear only as the first > argument of a vcond. > > I can think of 2 natural ways to improve things: either implement vector > comparisons in the ARM backend (possibly by forwarding to their existing > code for vcond), or in the generic expansion code try using vcond if the > direct comparison opcode is not provided. > > We can temporarily revert my patch, but I would like it to be temporary. > Since aarch64 seems to handle the same code just fine, maybe someone who > knows arm could copy the relevant code over? > > Does my message make sense, do people have comments?
So what complicates things now (and to some extent pre-existed when you used AVX512 which _could_ operate on boolean vectors) is that we have split out the condition from VEC_COND_EXPR to separate stmts but we do not expect backends to be able to code-generate the separate form - instead we rely on the ISEL pass to trasform VEC_COND_EXPRs to .VCOND[U] "merging" the compares again. Now that process breaks down once we have things like _9 = _1 | _3; - at some point I argued that we should handle vector compares [and operations on boolean vectors] as well in ISEL but then when it came up again for some reason I disregarded that again. Thus - we don't want to go back to fixing up the generic expansion code (which looks at one instruction at a time and is restricted by TER single-use restrictions). Instead we want to deal with this in ISEL which should behave more intelligently. In the above case it might involve turning the _1 and _3 defs into .VCOND [with different result type], doing _9 in that type and then somehow dealing with _7 ... but this eventually means undoing the match simplification that introduced the code? Not sure if that helps though. Richard. > -- > Marc Glisse