Re: Renaming moutline-msabi-xlogues to mcall-ms2sysv-xlogues
On Sun, Apr 09, 2017 at 03:52:30PM -0500, Daniel Santos wrote: > So I've been browsing through the gcc docs for other archs and > noticed that they all use different terminology for their options that > call or jump to stubs as a substitute for emitting inline saves & > restores for registers. > > ARC: -mno-millicode > AVR: -mcall-prologues > V850: -mno-prolog-function(enabled by default) > > I think that PowerPC/rs6000 does this without an option (or maybe in -Os?). The rs6000 ports determines what to do in function rs6000_savres_strategy. Whether or not to do inline saves is different per kind of registers (integer, float, vector), per ABI, and depends on other factors as well: we always inline if it is just as small, we always inline if the outline routines wouldn't work, and indeed for some ABIs we inline unless -Os was used. There are some more considerations. But yes, there is no option to force different code generation. This is a good thing. Segher
Re: [RFA] update ggc_min_heapsize_heuristic()
On Sun, Apr 09, 2017 at 10:06:21PM +0200, Markus Trippelsdorf wrote: > On 2017.04.09 at 21:10 +0200, Markus Trippelsdorf wrote: > > On 2017.04.09 at 21:25 +0300, Alexander Monakov wrote: > > > On Sun, 9 Apr 2017, Markus Trippelsdorf wrote: > > > > > > > The minimum size heuristic for the garbage collector's heap, before it > > > > starts collecting, was last updated over ten years ago. > > > > It currently has a hard upper limit of 128MB. > > > > This is too low for current machines where 8GB of RAM is normal. > > > > So, it seems to me, a new upper bound of 1GB would be appropriate. > > > > > > While amount of available RAM has grown, so has the number of available > > > CPU > > > cores (counteracting RAM growth for parallel builds). Building under a > > > virtualized environment with less-than-host RAM got also more common I > > > think. > > > > > > Bumping it all the way up to 1GB seems excessive, how did you arrive at > > > that > > > figure? E.g. my recollection from watching a Firefox build is that most of > > > compiler instances need under 0.5GB (RSS). > > > > 1GB was just a number I've picked to get the discussion going. > > And you are right, 512MB looks like a good compromise. > > > > > > Compile times of large C++ projects improve by over 10% due to this > > > > change. > > > > > > Can you explain a bit more, what projects you've tested?.. 10+% looks > > > surprisingly high to me. > > > > I've checked LLVM build times on ppc64le and X86_64. > > Here are the ppc64le numbers (llvm+clang+lld Release build): > > --param ggc-min-heapsize=131072 : > ninja -j60 15951.08s user 256.68s system 5448% cpu 4:57.46 total > > --param ggc-min-heapsize=524288 : > ninja -j60 14192.62s user 253.14s system 5527% cpu 4:21.34 total seriously nice! that said I do unfortunately see where the its too late in the release cycle argument is coming from, but I think we should at least do something for gcc 8. Trev > > -- > Markus
Re: [RFA] update ggc_min_heapsize_heuristic()
On 09/04/17 21:06, Markus Trippelsdorf wrote: > On 2017.04.09 at 21:10 +0200, Markus Trippelsdorf wrote: >> On 2017.04.09 at 21:25 +0300, Alexander Monakov wrote: >>> On Sun, 9 Apr 2017, Markus Trippelsdorf wrote: >>> The minimum size heuristic for the garbage collector's heap, before it starts collecting, was last updated over ten years ago. It currently has a hard upper limit of 128MB. This is too low for current machines where 8GB of RAM is normal. So, it seems to me, a new upper bound of 1GB would be appropriate. >>> >>> While amount of available RAM has grown, so has the number of available CPU >>> cores (counteracting RAM growth for parallel builds). Building under a >>> virtualized environment with less-than-host RAM got also more common I >>> think. >>> >>> Bumping it all the way up to 1GB seems excessive, how did you arrive at that >>> figure? E.g. my recollection from watching a Firefox build is that most of >>> compiler instances need under 0.5GB (RSS). >> >> 1GB was just a number I've picked to get the discussion going. >> And you are right, 512MB looks like a good compromise. >> Compile times of large C++ projects improve by over 10% due to this change. >>> >>> Can you explain a bit more, what projects you've tested?.. 10+% looks >>> surprisingly high to me. >> >> I've checked LLVM build times on ppc64le and X86_64. > > Here are the ppc64le numbers (llvm+clang+lld Release build): > > --param ggc-min-heapsize=131072 : > ninja -j60 15951.08s user 256.68s system 5448% cpu 4:57.46 total > > --param ggc-min-heapsize=524288 : > ninja -j60 14192.62s user 253.14s system 5527% cpu 4:21.34 total > I think that's still too high. We regularly see quad-core boards with 1G of ram, or octa-core with 2G. ie 256k/core. So even that would probably be touch and go after you've accounted for system memory and other processes on the machine. Plus, for big systems it's nice to have beefy ram disks as scratch areas, it can save a lot of disk IO. What are the numbers with 256M? R.
Re: [RFA] update ggc_min_heapsize_heuristic()
On 2017.04.10 at 10:56 +0100, Richard Earnshaw (lists) wrote: > On 09/04/17 21:06, Markus Trippelsdorf wrote: > > On 2017.04.09 at 21:10 +0200, Markus Trippelsdorf wrote: > >> On 2017.04.09 at 21:25 +0300, Alexander Monakov wrote: > >>> On Sun, 9 Apr 2017, Markus Trippelsdorf wrote: > >>> > The minimum size heuristic for the garbage collector's heap, before it > starts collecting, was last updated over ten years ago. > It currently has a hard upper limit of 128MB. > This is too low for current machines where 8GB of RAM is normal. > So, it seems to me, a new upper bound of 1GB would be appropriate. > >>> > >>> While amount of available RAM has grown, so has the number of available > >>> CPU > >>> cores (counteracting RAM growth for parallel builds). Building under a > >>> virtualized environment with less-than-host RAM got also more common I > >>> think. > >>> > >>> Bumping it all the way up to 1GB seems excessive, how did you arrive at > >>> that > >>> figure? E.g. my recollection from watching a Firefox build is that most of > >>> compiler instances need under 0.5GB (RSS). > >> > >> 1GB was just a number I've picked to get the discussion going. > >> And you are right, 512MB looks like a good compromise. > >> > Compile times of large C++ projects improve by over 10% due to this > change. > >>> > >>> Can you explain a bit more, what projects you've tested?.. 10+% looks > >>> surprisingly high to me. > >> > >> I've checked LLVM build times on ppc64le and X86_64. > > > > Here are the ppc64le numbers (llvm+clang+lld Release build): > > > > --param ggc-min-heapsize=131072 : > > ninja -j60 15951.08s user 256.68s system 5448% cpu 4:57.46 total > > > > --param ggc-min-heapsize=524288 : > > ninja -j60 14192.62s user 253.14s system 5527% cpu 4:21.34 total > > > > I think that's still too high. We regularly see quad-core boards with > 1G of ram, or octa-core with 2G. ie 256k/core. > > So even that would probably be touch and go after you've accounted for > system memory and other processes on the machine. Yes, the calculation in ggc_min_heapsize_heuristic() could be adjusted to take the number of "cores" into account. So that on an 8GB 4-core machine it would return 512k. And less than that for machines with less memory or higher core counts. > Plus, for big systems it's nice to have beefy ram disks as scratch > areas, it can save a lot of disk IO. > > What are the numbers with 256M? Here are the numbers from a 4core/8thread 16GB RAM Skylake machine. They look less stellar than the ppc64le ones (variability is smaller): --param ggc-min-heapsize=131072 11264.89user 311.88system 24:18.69elapsed 793%CPU (0avgtext+0avgdata 1265352maxresident)k --param ggc-min-heapsize=393216 10655.42user 347.92system 23:01.17elapsed 796%CPU (0avgtext+0avgdata 1280476maxresident)k --param ggc-min-heapsize=524288 10565.33user 352.90system 22:51.33elapsed 796%CPU (0avgtext+0avgdata 1506348maxresident)k -- Markus
Re: [RFA] update ggc_min_heapsize_heuristic()
On 2017.04.10 at 12:15 +0200, Markus Trippelsdorf wrote: > On 2017.04.10 at 10:56 +0100, Richard Earnshaw (lists) wrote: > > > > What are the numbers with 256M? > > Here are the numbers from a 4core/8thread 16GB RAM Skylake machine. > They look less stellar than the ppc64le ones (variability is smaller): > > --param ggc-min-heapsize=131072 > 11264.89user 311.88system 24:18.69elapsed 793%CPU (0avgtext+0avgdata > 1265352maxresident)k --param ggc-min-heapsize=262144 10778.52user 336.34system 23:15.71elapsed 796%CPU (0avgtext+0avgdata 1277468maxresident)k > --param ggc-min-heapsize=393216 > 10655.42user 347.92system 23:01.17elapsed 796%CPU (0avgtext+0avgdata > 1280476maxresident)k > > --param ggc-min-heapsize=524288 > 10565.33user 352.90system 22:51.33elapsed 796%CPU (0avgtext+0avgdata > 1506348maxresident)k -- Markus
Re: [RFA] update ggc_min_heapsize_heuristic()
On Mon, Apr 10, 2017 at 12:52:15PM +0200, Markus Trippelsdorf wrote: > > --param ggc-min-heapsize=131072 > > 11264.89user 311.88system 24:18.69elapsed 793%CPU (0avgtext+0avgdata > > 1265352maxresident)k > > --param ggc-min-heapsize=262144 > 10778.52user 336.34system 23:15.71elapsed 796%CPU (0avgtext+0avgdata > 1277468maxresident)k > > > --param ggc-min-heapsize=393216 > > 10655.42user 347.92system 23:01.17elapsed 796%CPU (0avgtext+0avgdata > > 1280476maxresident)k > > > > --param ggc-min-heapsize=524288 > > 10565.33user 352.90system 22:51.33elapsed 796%CPU (0avgtext+0avgdata > > 1506348maxresident)k So 256MB gets 70% of the speed gain of 512MB, but for only 5% of the cost in RSS. 384MB is an even better tradeoff for this testcase (but smaller is safer). Can the GC not tune itself better? Or, not cost so much in the first place ;-) Segher
Re: [RFA] update ggc_min_heapsize_heuristic()
On 10/04/17 12:06, Segher Boessenkool wrote: > On Mon, Apr 10, 2017 at 12:52:15PM +0200, Markus Trippelsdorf wrote: >>> --param ggc-min-heapsize=131072 >>> 11264.89user 311.88system 24:18.69elapsed 793%CPU (0avgtext+0avgdata >>> 1265352maxresident)k >> >> --param ggc-min-heapsize=262144 >> 10778.52user 336.34system 23:15.71elapsed 796%CPU (0avgtext+0avgdata >> 1277468maxresident)k >> >>> --param ggc-min-heapsize=393216 >>> 10655.42user 347.92system 23:01.17elapsed 796%CPU (0avgtext+0avgdata >>> 1280476maxresident)k >>> >>> --param ggc-min-heapsize=524288 >>> 10565.33user 352.90system 22:51.33elapsed 796%CPU (0avgtext+0avgdata >>> 1506348maxresident)k > > So 256MB gets 70% of the speed gain of 512MB, but for only 5% of the cost > in RSS. 384MB is an even better tradeoff for this testcase (but smaller > is safer). > > Can the GC not tune itself better? Or, not cost so much in the first > place ;-) > > > Segher > I think the idea of a fixed number is that it avoids the problem of bug reproducibility in the case of memory corruption. R.
Re: [RFA] update ggc_min_heapsize_heuristic()
On 2017.04.10 at 13:14 +0100, Richard Earnshaw (lists) wrote: > On 10/04/17 12:06, Segher Boessenkool wrote: > > On Mon, Apr 10, 2017 at 12:52:15PM +0200, Markus Trippelsdorf wrote: > >>> --param ggc-min-heapsize=131072 > >>> 11264.89user 311.88system 24:18.69elapsed 793%CPU (0avgtext+0avgdata > >>> 1265352maxresident)k > >> > >> --param ggc-min-heapsize=262144 > >> 10778.52user 336.34system 23:15.71elapsed 796%CPU (0avgtext+0avgdata > >> 1277468maxresident)k > >> > >>> --param ggc-min-heapsize=393216 > >>> 10655.42user 347.92system 23:01.17elapsed 796%CPU (0avgtext+0avgdata > >>> 1280476maxresident)k > >>> > >>> --param ggc-min-heapsize=524288 > >>> 10565.33user 352.90system 22:51.33elapsed 796%CPU (0avgtext+0avgdata > >>> 1506348maxresident)k > > > > So 256MB gets 70% of the speed gain of 512MB, but for only 5% of the cost > > in RSS. 384MB is an even better tradeoff for this testcase (but smaller > > is safer). > > > > Can the GC not tune itself better? Or, not cost so much in the first > > place ;-) > > > > > > Segher > > > > I think the idea of a fixed number is that it avoids the problem of bug > reproducibility in the case of memory corruption. Please note that you will get fixed numbers (defined in gcc/params.def) for all non-release compiler configs. For release builds the numbers already vary according to the host. They get calculated in ggc-common.c. -- Markus
Release criteria for Darwin
I see that, in the GCC 7 Release Criteria, the Secondary Platforms list includes i686-apple-darwin. Should this now be x86_64-apple-darwin? I've been building this since GCC 4.5.0, Darwin 10, in 2011.
Re: Release criteria for Darwin
On Mon, Apr 10, 2017 at 10:58 AM, Simon Wright wrote: > I see that, in the GCC 7 Release Criteria, the Secondary Platforms list > includes i686-apple-darwin. > > Should this now be x86_64-apple-darwin? I've been building this since GCC > 4.5.0, Darwin 10, in 2011. If the Darwin maintainers concur, this seems like an appropriate change. Thanks, David
Re: Release criteria for Darwin
> On Apr 10, 2017, at 8:17 AM, David Edelsohn wrote: > > On Mon, Apr 10, 2017 at 10:58 AM, Simon Wright wrote: >> I see that, in the GCC 7 Release Criteria, the Secondary Platforms list >> includes i686-apple-darwin. >> >> Should this now be x86_64-apple-darwin? I've been building this since GCC >> 4.5.0, Darwin 10, in 2011. > > If the Darwin maintainers concur, this seems like an appropriate change. Yes. It was safe to do that a long, long time ago.
lvx versus lxvd2x on power8
Hi all, I recently checked this old discussion about when/why to use lxvd2x instead of lvsl/lvx/vperm/lvx to load elements from memory to vector: https://gcc.gnu.org/ml/gcc/2015-03/msg00135.html I had the same doubt and I was also concerned how performance influences on these approaches. So that, I created the following project to check which one is faster and how memory alignment can influence on results: https://github.com/PPC64/load_vec_cmp This is a simple code, that many loads (using both approaches) are executed in a simple loop in order to measure which implementation is slower. The project also considers alignment. As it can be seen on this plot (https://raw.githubusercontent.com/igorsnunes/load_vec_cmp/master/doc/LoadVecCompare.png) an unaligned load using lxvd2x takes more time. The previous discussion (as far as I could see) addresses that lxvd2x performs better than lvsl/lvx/vperm/lvx in all cases. Is that correct? Is my analysis wrong? This issue concerned me, once lxvd2x is heavily used on compiled code. Regards, Igor
Re: g++ extension for Concepts TS
cc Andrew Sutton From: gcc-ow...@gcc.gnu.org on behalf of Christopher Di Bella Sent: April 2, 2017 8:57 AM To: gcc Mailing List Subject: g++ extension for Concepts TS Hey all, I've been working on a concept extension that permits type aliases inside the requires-seq. The grammar addition is fairly simple. ``` requirement-seq requirement alias-declaration requirement-seq requirement ``` Semantically, this change forces a requirement-body to open a new scope to house the alias. I've managed to get it working for variable concepts, but not function concepts. It looks like type aliases for some concepts are tricking the compiler into thinking that there are multiple statements. For example: ```cpp template concept bool Foo = requires(T a) { using type = T; using value_type = typename std::vector::value_type; {a + a} -> value_type; {a - a} -> type; {a + a} -> typename std::vector::value_type; {a - a} -> T; }; ``` works, but ```cpp template concept bool Foo() { requires(T a) { using type = T; using value_type = typename std::vector::value_type; {a + a} -> value_type; {a - a} -> type; {a + a} -> typename std::vector::value_type; {a - a} -> T; }; } ``` fails with ``` test.cpp: In function 'concept bool Foo()': test.cpp:4:14: error: definition of concept 'concept bool Foo()' has multiple statements concept bool Foo() { ^~~ test.cpp: In function 'int main()': test.cpp:17:10: error: deduced initializer does not satisfy placeholder constraints Foo i = 0; ^ test.cpp:17:10: note: in the expansion of concept '(Foo)()' template concept bool Foo() [with T = int] ``` After some inspection, I've deduced that the issue is flagged in constraint.cc:2527, where a DECL_EXPR is identified, instead of a RETURN_EXPR. I'm wondering if it's trivially possible to ignore these declarations? E.g. a loop that somewhat resembles: ```cpp while (body != TREE_NULL && TREE_CODE(STATEMENT_LIST_HEAD(body)) == DECL_EXPR && is also an alias declaration) body = STATEMENT_LIST_TAIL(body); if (body != TREE_NULL) error... // else cleared of all charges ``` Cheers, Chris