https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545
--- Comment #11 from Tamar Christina <tnfchris at gcc dot gnu.org> --- (In reply to Jonathan Wakely from comment #9) > Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653731.html > > Rerunning benchmarks with this patch would be very welcome. OK, I have tested the patches on AArch64 and also compared against our vectorized cases. I tested 5 needle positions (where the element it's searching for is found): 1 0 100 1000 10000 -1 where 0 is never find, and -1 means fine last: Results are: +--------+-----------+---------+---------+ | NEEDLE | scalar 1x | vect | memchr | +--------+-----------+---------+---------+ | 1 | -11.67% | -14.95% | -12.92% | | 0 | 137.48% | -82.31% | -83.36% | | 100 | 3.75% | 17.06% | 8.02% | | 1000 | -10.34% | -10.83% | 0.29% | | 10000 | -3.25% | -4.97% | -5.19% | | -1 | 10.28% | -31.17% | -33.93% | +--------+-----------+---------+---------+ So it looks like, staying with scalar the unrolling still has a positive effect, but calling memchr has an effect on longer searches, shorter ones. the vector loop as currently vectorized has about a 10% unneeded overhead which we will be working on this year. But otherwise it's also a significant win for longer searches. So perhaps an idea is to use memchr for bytes, for everything else remove the unrolled code and let the vectorizer take care of it, and if that fails let the RTL or tree unroller do it? for completeness, my benchmark was: #include <algorithm> #include <string> #include <fstream> #include <sstream> #include <stdio.h> #ifndef NEEDLE #define NEEDLE 50 #endif #ifndef ITERS #define ITERS 1000 #endif __attribute__((noipa)) bool doIt (const char* s, char v, size_t len) { const char* l = s + len; const char* r = std::find (s, l, v); return (r != l); } int main () { std::ifstream t("find.data"); std::stringstream buffer; buffer << t.rdbuf(); std::string content = buffer.str(); if (NEEDLE > 0) content[NEEDLE-1] = '|'; else if (NEEDLE < 0) content[content.length()-1] = '|'; bool found = false; for (int i = 0; i < ITERS; i++) found = found | doIt (content.c_str (), '|', content.length ()); return found; }