https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
--- Comment #23 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 18 Jan 2022, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771 > > --- Comment #22 from Hongtao.liu <crazylht at gmail dot com> --- > (In reply to Hongtao.liu from comment #21) > > (In reply to Hongtao.liu from comment #20) > > > (In reply to Richard Biener from comment #19) > > > > Ah, so the issue is missing -mavx512bw which means we end up with a AVX2 > > > > style > > > > mask for V32QImode. With -mavx512bw the code vectorizes fine. > > > > > > Vectorization code is worse than before, now we need to pack vectorized > > > mask > > > which takes extra 3 instructions. > > > > Current ifcvt convert > > > > ---------dump of .ch_vect------- > > if (x.1_14 > 255) > > goto <bb 4>; [50.00%] > > else > > goto <bb 5>; [50.00%] > > > > <bb 4> [local count: 477815112]: > > _17 = -_5; > > _18 = _17 >> 31; > > iftmp.0_19 = (unsigned char) _18; > > goto <bb 6>; [100.00%] > > > > <bb 5> [local count: 477815112]: > > iftmp.0_20 = (unsigned char) _5; > > > > <bb 6> [local count: 955630225]: > > # iftmp.0_21 = PHI <iftmp.0_19(4), iftmp.0_20(5)> > > -------dump end--------- > > > > > > to > > ---- dump of .ifcvt--------- > > _41 = -x.1_14; > > _17 = (int) _41; > > _18 = _17 >> 31; > > iftmp.0_19 = (unsigned char) _18; -- vec_pack_trunc > > iftmp.0_20 = (unsigned char) _5; -- vec_pack_trunc > > iftmp.0_21 = x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20; -- vec_pack_trunc > > *_6 = iftmp.0_21; > > x_16 = x_24 + 1; > > -----dump end---------- > > > > > > if ifcvt output things like > > ------------optimal .ifcvt------ > > _41 = -x.1_14; > > _17 = (int) _41; > > _18 = _17 >> 31; > > iftmp.0_21 = x.1_14 > 255 ? _18 : _5; > > iftmp.0_22 = (unsigned char) iftmp.0_21; --- vec_pack_trunc > > *_6 = iftmp.0_22; > > x_16 = x_24 + 1; > > ------------end------------ > > > > we can save operations for packing mask(3 vec_pack_trunc vs 1 > > vec_pack_trunc?). > > Or maybe a gimple simplification for it? Yes, I think that's a candidate for a match.pd simplification. Fortunately if-conversion already folds the built stmts.