Julian Brown <jul...@codesourcery.com> wrote on 11/10/2010 04:29:15 PM:
> In further followups (at the risk of misrepresenting Joseph & Paul > Brook's opinions!), there seemed to be general agreement that a scheme > something like that outlined below, with "permuting" loads/stores and > some way of handling multiple in-register layouts for vectors seems > like it will be a necessary addition to the vectorizer, going forward. Hi, Let me check that I understand the problem first: the problem is that VLD1 and VST1 instructions in big endian mode follow the array numbering of elements, while all other memory instructions (VLDR, VLDM,VSTR, VSTM) do not. So, do we have two problems here? The first one that VLD1/VST1 and VLDR, etc. can't be mixed in one computation. And the second one, that access to a single element is incorrect, when VLDR, etc. are used. Is that correct? In addition, we need to think about how to represent VLD2/3, so the vectorizer can use them. Right? > I'm thinking (without having much idea about how feasible such an idea > is) of something along the lines of a function (in the mathematical > sense) attached to each vector value manipulated by the vectorizer, to > map that value's element numberings to and from memory offsets. Joseph Myers <jos...@codesourcery.com> wrote on 08/10/2010 02:54:29 AM: > Make it possible to describe in generic RTL a permuting > vector load whose alignment requirement is element alignment, describe > vld1 that way, and teach the vectorizer how to use such loads and stores. Does that mean that the vectorizer will be aware of specific instructions? I can see several places where the order of elements is important in vectorizer's code generation: - interleave_high/low and widening operations - but I am not sure that the current implementation suits NEON best, so maybe those are less important - extraction of scalar result in reduction > The ARM implementations of reduction operations > fortuitously calculate the results across all elements simultaneously, > so when one of those elements is extracted, we still get the right > answer. So, does that mean that's not a problem? - various scalar/invariant vectors, including initializations for reduction and induction - the order of elements in loads and stores should match Thanks, Ira _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain