https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 CC| |hubicka at gcc dot gnu.org Status|UNCONFIRMED |NEW Last reconfirmed| |2020-12-08 --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Hongyu Wang from comment #4) > (In reply to Richard Biener from comment #3) > > > I see ret[0] has store-motion applied. You don't see it vectorized > > because GCC doesn't know how to vectorize sincos (or cexpi which is > > what it lowers it to). > > I doubt so, after manually store motion > > #include <cmath> > > float foo( > int *x, > int n, > float tx > ) > { > float ret[n]; > float tmp; > > #pragma omp simd > for (int i = 0; i < n; i++) > { > float s, c; > > sincosf( tx * x[i] , &s, &c ); > > tmp += s*c; > } > > ret[0] += tmp; > > return ret[0]; > } > > with -Ofast -fopenmp-simd -std=c++11 it could be vectorized to call > _ZGVbN4vvv_sincosf > > ret[0] is moved for sinf() case, but not sincosf() with above options. What target are you targeting? Can you provide the sincosf prototype from your math.h? (please attach preprocessed source). I cannot reproduce sincosf _not_ being lowered to cexpif and thus no longer having memory writes. > > > > If you replace sincosf with a random call then you'll hit the issue > > that LIMs dependence analysis doesn't handle it at all since it cannot > > represent it. That will block further optimization in the loop. > > > > That can possibly be improved. > > > > So could LIMs dependence analysis handle known library function and just > analyze their memory parameter? Random call may have unknown behavior. Well, in gather_mem_refs_stmt we have mem = simple_mem_ref_in_stmt (stmt, &is_stored); if (!mem) { /* We use the shared mem_ref for all unanalyzable refs. */ id = UNANALYZABLE_MEM_ID; so in this path you could in theory special-case sincos{,f,l} and handle it by adding two refs. Note this will likely break down once LIM decides to apply some motion to one of the refs since it doesn't know how to re-materialize them (they'll be treated as independent but are really tied together). So the other option is to improve things by having multiple "UNANALYZABLE_MEM_ID"s and while not handling them as candidates for invariant or store motion handle them in the dependence analysis part, possibly by just using the classical stmt-based dependence tests. But this requires quite some refactoring. > > > if (nonpure_call_p (stmt)) > > > { > > > maybe_never = true; > > > outermost = NULL; > > > } > > > > > > So no store-motion chance for any future statement in such block. > > > > That's another issue - the call may not return. Here the granularity > > is per BB and thus loads/stores in the same BB are not considered for > > sinking. > > > > IMHO the condition may be too strict for known library calls. Yes. For a LIM testcase an example with a memcpy might be more practically relevant. For refactoring I'd start with classifying the unanalyzable refs as separate ref ID, marking it with another bit like ref_unanalyzed in in_mem_ref and asserting there's a single access of such refs. The mem_refs_may_alias_p code then needs to use stmt-based alias queries instead of refs_may_alias_p_1 using accesses_in_loop[0]->stmt. And code testing for UNANALYZABLE_MEM_ID now needs to look at the ref_unanalyzed flag to not consider those refs for transforms. Note this may blow up the memory requirements for testcases with lots of "unanalyzable" refs. The nonpure-call code is more difficult to improve, even sincos can not return when the access to s or c traps. Analyzing the arguments might help here. If you disregard that detail I think all ECF_LEAF|ECF_NOTHROW functions return normally.