Richard Biener <[email protected]> writes:
> On Thu, Sep 20, 2018 at 3:40 PM Richard Sandiford
> <[email protected]> wrote:
>>
>> Richard Biener <[email protected]> writes:
>> > On Mon, Sep 17, 2018 at 2:40 PM Andrew Stubbs <[email protected]>
>> > wrote:
>> >> On 17/09/18 12:43, Richard Sandiford wrote:
>> >> > OK, sounds like the cost of vec_construct is too low then. But looking
>> >> > at the port, I see you have:
>> >> >
>> >> > /* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST. */
>> >> >
>> >> > int
>> >> > gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED
>> >> > (type_of_cost),
>> >> > tree ARG_UNUSED (vectype), int ARG_UNUSED
>> >> > (misalign))
>> >> > {
>> >> > /* Always vectorize. */
>> >> > return 1;
>> >> > }
>> >> >
>> >> > which short-circuits the cost-model altogether. Isn't that part
>> >> > of the problem?
>> >>
>> >> Well, it's possible that that's a little simplistic. ;-)
>> >>
>> >> Although, actually the elementwise issue predates the existence of
>> >> gcn_vectorization_cost, and the default does appear to penalize
>> >> vec_construct somewhat.
>> >>
>> >> Actually, the default definition doesn't seem to do much besides
>> >> increase vec_construct, so I'm not sure now why I needed to change it?
>> >> Hmm, more experiments to do.
>> >>
>> >> Thanks for the pointer.
>> >
>> > Btw, we do not consider to use gather/scatter for VMAT_ELEMENTWISE,
>> > that's a missed "optimization" quite possibly because gather/scatter is so
>> > expensive on x86. Thus the vectorizer should consider this and use the
>> > cheaper alternative according to the cost model (which you of course should
>> > fill with sensible values...).
>>
>> Do you mean it this way round, or that it doesn't consider using
>> VMAT_ELEMENTWISE for natural gather/scatter accesses? We do use
>> VMAT_GATHER_SCATTER instead of VMAT_ELEMENTWISE where possible for SVE,
>> but that relies on implementing the new optabs instead of using the old
>> built-in-based interface, so it doesn't work for x86 yet.
>>
>> I guess we might need some way of selecting between the two if
>> the costs of gather and scatter are context-dependent in some way.
>> But if gather/scatter is always more expensive than VMAT_ELEMENTWISE
>> for certain modes then it's probably better not to define the optabs
>> for those modes.
>
> I think we can't vectorize true gathers (indexed from memory loads) w/o
> gather yet, right?
Right.
> So I really was thinking of implementing VMAT_ELEMENTWISE (invariant
> stride) and VMAT_STRIDED_SLP by composing the appropriate index vector
> with a splat and multiplication and using a gather. I think that's
> not yet implemented?
For SVE we use:
/* As a last resort, trying using a gather load or scatter store.
??? Although the code can handle all group sizes correctly,
it probably isn't a win to use separate strided accesses based
on nearby locations. Or, even if it's a win over scalar code,
it might not be a win over vectorizing at a lower VF, if that
allows us to use contiguous accesses. */
if (*memory_access_type == VMAT_ELEMENTWISE
&& single_element_p
&& loop_vinfo
&& vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
masked_p, gs_info))
*memory_access_type = VMAT_GATHER_SCATTER;
in get_group_load_store_type. This only works when the target defines
gather/scatter using optabs rather than built-ins.
But yeah, no VMAT_STRIDED_SLP support yet. That would be good
to have...
Richard