Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > On Fri, 17 Sep 2021, Richard Biener wrote: > >> On Fri, 17 Sep 2021, Richard Sandiford wrote: >> >> > Richard Biener <rguent...@suse.de> writes: >> > > This adds the capability to analyze the dependence of mixed >> > > pointer/array accesses. The example is from where using a masked >> > > load/store creates the pointer-based access when an otherwise >> > > unconditional access is array based. Other examples would include >> > > accesses to an array mixed with accesses from inlined helpers >> > > that work on pointers. >> > > >> > > The idea is quite simple and old - analyze the data-ref indices >> > > as if the reference was pointer-based. The following change does >> > > this by changing dr_analyze_indices to work on the indices >> > > sub-structure and storing an alternate indices substructure in >> > > each data reference. That alternate set of indices is analyzed >> > > lazily by initialize_data_dependence_relation when it fails to >> > > match-up the main set of indices of two data references. >> > > initialize_data_dependence_relation is refactored into a head >> > > and a tail worker and changed to work on one of the indices >> > > structures and thus away from using DR_* access macros which >> > > continue to reference the main indices substructure. >> > > >> > > There are quite some vectorization and loop distribution opportunities >> > > unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r, >> > > 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and >> > > 544.nab_r see amendments in what they report with -fopt-info-loop while >> > > the rest of the specrate set sees no changes there. Measuring runtime >> > > for the set where changes were reported reveals nothing off-noise >> > > besides 511.povray_r which seems to regress slightly for me >> > > (on a Zen2 machine with -Ofast -march=native). >> > > >> > > Changes from the [RFC] version are properly handling bitfields >> > > that we cannot take the address of and optimization of refs >> > > that already are MEM_REFs and thus won't see any change. I've >> > > also elided changing the set of vect_masked_stores targets in >> > > favor of explicitely listing avx (but I did not verify if the >> > > testcase passes on aarch64-sve or amdgcn). >> > > >> > > The improves cases like the following from Povray: >> > > >> > > for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++) >> > > { >> > > VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]); >> > > Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X]; >> > > } >> > > >> > > where there is a plain array access mixed with abstraction >> > > using T[] or T* arguments. That should be a not too uncommon >> > > situation in the wild. The loop above is now vectorized and was not >> > > without the change. >> > > >> > > Bootstrapped and tested on x86_64-unknown-linux-gnu and I've >> > > built and run SPEC CPU 2017 successfully. >> > > >> > > OK? >> > >> > Took a while to page this stuff back in :-/ >> > >> > I guess if we're adding alt_indices to the main data_reference, >> > we'll need to free the access_fns in free_data_ref. It looked like >> > the patch as posted might have a memory leak. >> >> Doh, yes - thanks for noticing. >> >> > Perhaps instead we could use local indices structures for the >> > alt_indices and pass them in to initialize_data_dependence_relation? >> > Not that that's very elegant… >> >> Yeah, I had that but then since for N data references we possibly >> call initialize_data_dependence_relation N^2 times we'd do this >> alternate analysis N^2 times at worst instead of N so it looked worth >> caching it in the data reference. Since we have no idea why the >> first try fails we're going to try this fallback in the majority >> of cases that we cannot figure out otherwise so I didn't manage >> to hand-wave the quadraticness away ;) OTOH one might argue >> it's just a constant factor ontop of the number of >> initialize_data_dependence_relation invocations. >> >> So I can probably be convinced either way - guess I'll gather >> some statistics. > > I built SPEC 2017 CPU rate with -Ofast -march=znver2, overall there > are > > 4433976 calls to the first stage initialize_data_dependence_relation > (skipping the cases dr_may_alias returned false) > 360248 (8%) ended up recursing with a set of alt_indices > 83512 times we computed alt_indices of a DR (that's with the cache) > 14905 (0.3%) times the recursive invocation ended with != chrec_dont_know > > thus when not doing the caching we'd compute alt_indices about 10 times > more often. I didn't collect the number of distinct DRs (that's difficult > at this point), but I'd estimate from the above that we have 3 times > more "unused" alt_indices than used. > > OK, so that didn't really help me avoid flipping a coin ;)
Sounds like a good justification for keeping the caching to me :-) Richard