https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #5 from Kewen Lin <linkw at gcc dot gnu.org> --- (In reply to Kewen Lin from comment #4) > One rough idea seems: > 1) Relax this condition all_uniform_p somehow to get SLP instance building > to go deeper and get those p1/p2 loads as SLP nodes. > 2) Introduce one more vect_pattern recognizer to catch this kind of > pattern, transform the slp instance as we expect. I assume we can know the > whole slp instance then we can transform it as we want here. Probably need > some costing condition to gate this pattern matching. > 3) If 2) fail, trim the slp instance from those nodes which satisfy > all_uniform_p condition to ensure it's same as before. > For 2), instead of vect_pattern with IFN, the appropriate place seems to be vect_optimize_slp. But after more thinking, building SLP instance starting from group loads instead of group stores looks more straightforward. a0 = (p1[0] - p2[0]); a1 = (p1[1] - p2[1]); a2 = (p1[2] - p2[2]); a3 = (p1[3] - p2[3]); Building the vector <a0, a1, a2, a3> looks more natural and then check the uses of its all lanes and special patterns to have vector <t0, t1, t2, t3> and repeat similarly. Hi Richi, Is this a good example to request SLP instance build starting group loads?