https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92492
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed| |2019-11-13 Blocks| |53947 Ever confirmed|0 |1 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- t.c:12:14: missed: not vectorized: relevant stmt not supported: _4 = (int) _3; # x_25 = PHI <x_13(7), 0(15)> # ivtmp_29 = PHI <ivtmp_28(7), 16(15)> _1 = (sizetype) x_25; _2 = src_9(D) + _1; _3 = *_2; _4 = (int) _3; _5 = dst_10(D) + _1; _11 = _4 & -64; _14 = -_4; _15 = _14 >> 7; iftmp.0_16 = (unsigned char) _15; iftmp.0_17 = _11 == 0 ? _3 : iftmp.0_16; *_5 = iftmp.0_17; x_13 = x_25 + 1; ivtmp_28 = ivtmp_29 - 1; if (ivtmp_28 != 0) goto <bb 7>; [93.75%] else goto <bb 6>; [6.25%] was the if-converted loop. This requires unpacking of char to int so this all boils down to failure to narrow all operations to 'short'. Might be doable with a match.pd pattern simplifying ((int) X) & -64 == 0 to X & -64 == 0 which we maybe already have but there's other uses of _4. ICC also uses effectively two vector sizes, v8qi and v8hi AFAICS? But why does it use %ymm then... Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations