https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116855
--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> --- On Sun, 6 Oct 2024, fxue at os dot amperecomputing.com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116855 > > --- Comment #5 from Feng Xue <fxue at os dot amperecomputing.com> --- > (In reply to Tamar Christina from comment #4) > > (In reply to Richard Biener from comment #3) > > > I would suggest to add a STMT_VINFO_ENSURE_NOTRAP or so and delay actual > > > verification to vectorizable_load when both vector type and VF are fixed. > > > We'd then likely need a LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P set > > > conservatively the other way around from > > > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. > > > Alignment peeling could then peel if STMT_VINFO_ENSURE_NOTRAP and the > > > target > > > cannot do full loop masking. > > > > > > Yeah the original reported testcase is fine as the alignment makes it safe. > > For the manually misaligned case that Andrew posted it makes sense to delay > > and re-evaluate later on. > > > > I don't think we should bother peeling though, I don't think they're that > > common and alignment peeling breaks some dominators and exposes some > > existing vectorizer bugs, which is being fixed in Alex's patch. > > > > So at least alignment peeling I'll defer to a later point and instead just > > reject loops that are loading from structures the user misaligned wrt to the > > vector mode. > > > > > > So mine.. > > Actually, what I wish is that we could allow vectorization on early break case > for arbitrary address pointer (knowing nothing about alignment and bound) > based > on some sort of assumption specified via command option under -Ofast, as the > mentioned example: I'd rather not have more command-line options gating "unsafe" transforms but instead have source-level control per loop via pragma. It should probably specify a length like simdlen, specifying that accessing [start + n * accesslen, start + (n+1)*accesslen - 1] is OK when the scalar loop accesses [start + n * accesslen] or so. > char * find(char *string, size_t n, char c) > { > for (size_t i = 0; i < n; i++) { > if (string[i] == c) > return &string[i]; > } > return 0; > } > > and example for which there is no way to do peeling to align more than one > address pointers: > > int compare(char *string1, char *string2, size_t n) > { > for (size_t i = 0; i < n; i++) { > if (string1[i] != string2[i]) > return string1[i] - string2[i]; > } > return 0; > }