Re: AutoFDO profile toolchain is open-sourced
> > Yes, it will. But it's not well tuned at all. I will start tuning it > > if I have free cycles. It would be great if opensource community can > > also contribute to this tuning effort. > > If you could outline portions of code which needs tuning, rewriting, that > will help get started in this effort. Optimization passes in GCC are generally designed to work with any kind of edge profile they get. There are only few cases where they do care about what profile is around. At the moment we consider two types of profiles - static (guessed) and FDO. For static one we shut down use of profile info for some heuristics - for example we do not expect loop trip counts to be reliable in the profiles because they are not. You can look for code checking profile_status_for_fn. Auto-FDO does not have special value for profile_status_for_fn and it goes with same code paths for FDO. Dehao has some patches for Auto-FDO tuning but my impression is that he got mostly got around by just makng optimizer bit more robust for nonsential profiles that is always good, since even FDO profiles can get wrong. BTW, Dehao, do you think you can submit these changes for this stage1? I suppose in this case we have yet another kind of profile that is less reliable than FDO and we need to start by simply benchmarking and looking for cases where this profile gets worse and handle them one by one :) Honza
Re: Merging debug-early work?
On 05/08/2015 01:51 AM, Richard Biener wrote: Did you see if --with-build-config=bootstrap-lto still works? Just did on x86-64 Linux. Bootstrap succeeds without any problems. While doing the LTO work I wondered why you have the late_global_decl loop in toplev.c:compile_file at all (well, maybe due to the missing one for global vars output). At least for function decls we should have covered everything by that point. It took a lot of trial and error with Jason reviewing these entry points, to find the right place to place things and not get a slew of unused DIEs created or unhandled (local statics, C++ clones, optimized away symbols, locals, etc etc). I can't remember the details, though I'm sure we discussed it at length when the patches went into the branch. Perhaps now that adding an unused decl DIE removal pass is a must, we could make the early/late business less efficient but cleaner, and then have the upcoming DIE removal pass clean things up? Ughh...it did take a while to get right without bloating things too much, but I don't doubt there are ways to streamline it further. Early debug seems to be enforced at finalize-compilation-unit time, but only for function decls - I wonder why not as well for global vars (and why the odd !decl_function_context check is there for functions). IIRC, stuff living in functions were handled through recursion within dwarf2out. I think we would miss local statics, or some such corner case. I invite you to comment things out and see where things fail in the bootstrap/tests :). Aldy
How to use old GPU (Fermi) in gcc with OpenACC?
Hi, I'm trying to use and evaluate gcc with OpenACC on some NVIDIA GPUs. I succeeded to build gcc with OpenACC by using http://scelementary.com/2015/04/25/openacc-in-gcc.html as a reference. Then, I succeeded to use Kepler GPU. However, I tried to use it on old GPUs (Fermi), and I failed to execute it. I noticed that there are some "sm_30" and "COMPUTE_30" keywords in gcc and nvptx sources. Then, I modified them to "sm_20" and "COMPUTE_20", but I failed to execute my programs, too. Are there any developers who can make gcc with OpenACC to support other than "sm_30"? thanks -- View this message in context: http://gcc.1065356.n5.nabble.com/How-to-use-old-GPU-Fermi-in-gcc-with-OpenACC-tp1147463.html Sent from the gcc - Dev mailing list archive at Nabble.com.