Re: Renaming moutline-msabi-xlogues to mcall-ms2sysv-xlogues

2017-04-10 Thread Segher Boessenkool
On Sun, Apr 09, 2017 at 03:52:30PM -0500, Daniel Santos wrote:
> So I've been browsing through the gcc docs for other archs and 
> noticed that they all use different terminology for their options that 
> call or jump to stubs as a substitute for emitting inline saves & 
> restores for registers.
> 
> ARC:  -mno-millicode
> AVR:  -mcall-prologues
> V850: -mno-prolog-function(enabled by default)
> 
> I think that PowerPC/rs6000 does this without an option (or maybe in -Os?).

The rs6000 ports determines what to do in function rs6000_savres_strategy.

Whether or not to do inline saves is different per kind of registers
(integer, float, vector), per ABI, and depends on other factors as well:
we always inline if it is just as small, we always inline if the outline
routines wouldn't work, and indeed for some ABIs we inline unless -Os
was used.  There are some more considerations.

But yes, there is no option to force different code generation.  This
is a good thing.


Segher


Re: [RFA] update ggc_min_heapsize_heuristic()

2017-04-10 Thread Trevor Saunders
On Sun, Apr 09, 2017 at 10:06:21PM +0200, Markus Trippelsdorf wrote:
> On 2017.04.09 at 21:10 +0200, Markus Trippelsdorf wrote:
> > On 2017.04.09 at 21:25 +0300, Alexander Monakov wrote:
> > > On Sun, 9 Apr 2017, Markus Trippelsdorf wrote:
> > > 
> > > > The minimum size heuristic for the garbage collector's heap, before it
> > > > starts collecting, was last updated over ten years ago.
> > > > It currently has a hard upper limit of 128MB.
> > > > This is too low for current machines where 8GB of RAM is normal.
> > > > So, it seems to me, a new upper bound of 1GB would be appropriate.
> > > 
> > > While amount of available RAM has grown, so has the number of available 
> > > CPU
> > > cores (counteracting RAM growth for parallel builds). Building under a
> > > virtualized environment with less-than-host RAM got also more common I 
> > > think.
> > > 
> > > Bumping it all the way up to 1GB seems excessive, how did you arrive at 
> > > that
> > > figure? E.g. my recollection from watching a Firefox build is that most of
> > > compiler instances need under 0.5GB (RSS).
> > 
> > 1GB was just a number I've picked to get the discussion going. 
> > And you are right, 512MB looks like a good compromise.
> > 
> > > > Compile times of large C++ projects improve by over 10% due to this
> > > > change.
> > > 
> > > Can you explain a bit more, what projects you've tested?.. 10+% looks
> > > surprisingly high to me.
> > 
> > I've checked LLVM build times on ppc64le and X86_64.
> 
> Here are the ppc64le numbers (llvm+clang+lld Release build):
> 
> --param ggc-min-heapsize=131072 :
>  ninja -j60  15951.08s user 256.68s system 5448% cpu 4:57.46 total
> 
> --param ggc-min-heapsize=524288 :
>  ninja -j60  14192.62s user 253.14s system 5527% cpu 4:21.34 total

seriously nice! that said I do unfortunately see where the its too late
in the release cycle argument is coming from, but I think we should at
least do something for gcc 8.

Trev

> 
> -- 
> Markus


Re: [RFA] update ggc_min_heapsize_heuristic()

2017-04-10 Thread Richard Earnshaw (lists)
On 09/04/17 21:06, Markus Trippelsdorf wrote:
> On 2017.04.09 at 21:10 +0200, Markus Trippelsdorf wrote:
>> On 2017.04.09 at 21:25 +0300, Alexander Monakov wrote:
>>> On Sun, 9 Apr 2017, Markus Trippelsdorf wrote:
>>>
 The minimum size heuristic for the garbage collector's heap, before it
 starts collecting, was last updated over ten years ago.
 It currently has a hard upper limit of 128MB.
 This is too low for current machines where 8GB of RAM is normal.
 So, it seems to me, a new upper bound of 1GB would be appropriate.
>>>
>>> While amount of available RAM has grown, so has the number of available CPU
>>> cores (counteracting RAM growth for parallel builds). Building under a
>>> virtualized environment with less-than-host RAM got also more common I 
>>> think.
>>>
>>> Bumping it all the way up to 1GB seems excessive, how did you arrive at that
>>> figure? E.g. my recollection from watching a Firefox build is that most of
>>> compiler instances need under 0.5GB (RSS).
>>
>> 1GB was just a number I've picked to get the discussion going. 
>> And you are right, 512MB looks like a good compromise.
>>
 Compile times of large C++ projects improve by over 10% due to this
 change.
>>>
>>> Can you explain a bit more, what projects you've tested?.. 10+% looks
>>> surprisingly high to me.
>>
>> I've checked LLVM build times on ppc64le and X86_64.
> 
> Here are the ppc64le numbers (llvm+clang+lld Release build):
> 
> --param ggc-min-heapsize=131072 :
>  ninja -j60  15951.08s user 256.68s system 5448% cpu 4:57.46 total
> 
> --param ggc-min-heapsize=524288 :
>  ninja -j60  14192.62s user 253.14s system 5527% cpu 4:21.34 total
> 

I think that's still too high.  We regularly see quad-core boards with
1G of ram, or octa-core with 2G.  ie 256k/core.

So even that would probably be touch and go after you've accounted for
system memory and other processes on the machine.

Plus, for big systems it's nice to have beefy ram disks as scratch
areas, it can save a lot of disk IO.

What are the numbers with 256M?

R.


Re: [RFA] update ggc_min_heapsize_heuristic()

2017-04-10 Thread Markus Trippelsdorf
On 2017.04.10 at 10:56 +0100, Richard Earnshaw (lists) wrote:
> On 09/04/17 21:06, Markus Trippelsdorf wrote:
> > On 2017.04.09 at 21:10 +0200, Markus Trippelsdorf wrote:
> >> On 2017.04.09 at 21:25 +0300, Alexander Monakov wrote:
> >>> On Sun, 9 Apr 2017, Markus Trippelsdorf wrote:
> >>>
>  The minimum size heuristic for the garbage collector's heap, before it
>  starts collecting, was last updated over ten years ago.
>  It currently has a hard upper limit of 128MB.
>  This is too low for current machines where 8GB of RAM is normal.
>  So, it seems to me, a new upper bound of 1GB would be appropriate.
> >>>
> >>> While amount of available RAM has grown, so has the number of available 
> >>> CPU
> >>> cores (counteracting RAM growth for parallel builds). Building under a
> >>> virtualized environment with less-than-host RAM got also more common I 
> >>> think.
> >>>
> >>> Bumping it all the way up to 1GB seems excessive, how did you arrive at 
> >>> that
> >>> figure? E.g. my recollection from watching a Firefox build is that most of
> >>> compiler instances need under 0.5GB (RSS).
> >>
> >> 1GB was just a number I've picked to get the discussion going. 
> >> And you are right, 512MB looks like a good compromise.
> >>
>  Compile times of large C++ projects improve by over 10% due to this
>  change.
> >>>
> >>> Can you explain a bit more, what projects you've tested?.. 10+% looks
> >>> surprisingly high to me.
> >>
> >> I've checked LLVM build times on ppc64le and X86_64.
> > 
> > Here are the ppc64le numbers (llvm+clang+lld Release build):
> > 
> > --param ggc-min-heapsize=131072 :
> >  ninja -j60  15951.08s user 256.68s system 5448% cpu 4:57.46 total
> > 
> > --param ggc-min-heapsize=524288 :
> >  ninja -j60  14192.62s user 253.14s system 5527% cpu 4:21.34 total
> > 
> 
> I think that's still too high.  We regularly see quad-core boards with
> 1G of ram, or octa-core with 2G.  ie 256k/core.
> 
> So even that would probably be touch and go after you've accounted for
> system memory and other processes on the machine.

Yes, the calculation in ggc_min_heapsize_heuristic() could be adjusted
to take the number of "cores" into account. 
So that on an 8GB 4-core machine it would return 512k. And less than
that for machines with less memory or higher core counts.

> Plus, for big systems it's nice to have beefy ram disks as scratch
> areas, it can save a lot of disk IO.
> 
> What are the numbers with 256M?

Here are the numbers from a 4core/8thread 16GB RAM Skylake machine.
They look less stellar than the ppc64le ones (variability is smaller):

 --param ggc-min-heapsize=131072
11264.89user 311.88system 24:18.69elapsed 793%CPU (0avgtext+0avgdata 
1265352maxresident)k

 --param ggc-min-heapsize=393216
10655.42user 347.92system 23:01.17elapsed 796%CPU (0avgtext+0avgdata 
1280476maxresident)k

 --param ggc-min-heapsize=524288
10565.33user 352.90system 22:51.33elapsed 796%CPU (0avgtext+0avgdata 
1506348maxresident)k

-- 
Markus


Re: [RFA] update ggc_min_heapsize_heuristic()

2017-04-10 Thread Markus Trippelsdorf
On 2017.04.10 at 12:15 +0200, Markus Trippelsdorf wrote:
> On 2017.04.10 at 10:56 +0100, Richard Earnshaw (lists) wrote:
> > 
> > What are the numbers with 256M?
> 
> Here are the numbers from a 4core/8thread 16GB RAM Skylake machine.
> They look less stellar than the ppc64le ones (variability is smaller):
> 
>  --param ggc-min-heapsize=131072
> 11264.89user 311.88system 24:18.69elapsed 793%CPU (0avgtext+0avgdata 
> 1265352maxresident)k

 --param ggc-min-heapsize=262144
10778.52user 336.34system 23:15.71elapsed 796%CPU (0avgtext+0avgdata 
1277468maxresident)k 

>  --param ggc-min-heapsize=393216
> 10655.42user 347.92system 23:01.17elapsed 796%CPU (0avgtext+0avgdata 
> 1280476maxresident)k
> 
>  --param ggc-min-heapsize=524288
> 10565.33user 352.90system 22:51.33elapsed 796%CPU (0avgtext+0avgdata 
> 1506348maxresident)k
-- 
Markus


Re: [RFA] update ggc_min_heapsize_heuristic()

2017-04-10 Thread Segher Boessenkool
On Mon, Apr 10, 2017 at 12:52:15PM +0200, Markus Trippelsdorf wrote:
> >  --param ggc-min-heapsize=131072
> > 11264.89user 311.88system 24:18.69elapsed 793%CPU (0avgtext+0avgdata 
> > 1265352maxresident)k
> 
>  --param ggc-min-heapsize=262144
> 10778.52user 336.34system 23:15.71elapsed 796%CPU (0avgtext+0avgdata 
> 1277468maxresident)k 
> 
> >  --param ggc-min-heapsize=393216
> > 10655.42user 347.92system 23:01.17elapsed 796%CPU (0avgtext+0avgdata 
> > 1280476maxresident)k
> > 
> >  --param ggc-min-heapsize=524288
> > 10565.33user 352.90system 22:51.33elapsed 796%CPU (0avgtext+0avgdata 
> > 1506348maxresident)k

So 256MB gets 70% of the speed gain of 512MB, but for only 5% of the cost
in RSS.  384MB is an even better tradeoff for this testcase (but smaller
is safer).

Can the GC not tune itself better?  Or, not cost so much in the first
place ;-)


Segher


Re: [RFA] update ggc_min_heapsize_heuristic()

2017-04-10 Thread Richard Earnshaw (lists)
On 10/04/17 12:06, Segher Boessenkool wrote:
> On Mon, Apr 10, 2017 at 12:52:15PM +0200, Markus Trippelsdorf wrote:
>>>  --param ggc-min-heapsize=131072
>>> 11264.89user 311.88system 24:18.69elapsed 793%CPU (0avgtext+0avgdata 
>>> 1265352maxresident)k
>>
>>  --param ggc-min-heapsize=262144
>> 10778.52user 336.34system 23:15.71elapsed 796%CPU (0avgtext+0avgdata 
>> 1277468maxresident)k 
>>
>>>  --param ggc-min-heapsize=393216
>>> 10655.42user 347.92system 23:01.17elapsed 796%CPU (0avgtext+0avgdata 
>>> 1280476maxresident)k
>>>
>>>  --param ggc-min-heapsize=524288
>>> 10565.33user 352.90system 22:51.33elapsed 796%CPU (0avgtext+0avgdata 
>>> 1506348maxresident)k
> 
> So 256MB gets 70% of the speed gain of 512MB, but for only 5% of the cost
> in RSS.  384MB is an even better tradeoff for this testcase (but smaller
> is safer).
> 
> Can the GC not tune itself better?  Or, not cost so much in the first
> place ;-)
> 
> 
> Segher
> 

I think the idea of a fixed number is that it avoids the problem of bug
reproducibility in the case of memory corruption.

R.


Re: [RFA] update ggc_min_heapsize_heuristic()

2017-04-10 Thread Markus Trippelsdorf
On 2017.04.10 at 13:14 +0100, Richard Earnshaw (lists) wrote:
> On 10/04/17 12:06, Segher Boessenkool wrote:
> > On Mon, Apr 10, 2017 at 12:52:15PM +0200, Markus Trippelsdorf wrote:
> >>>  --param ggc-min-heapsize=131072
> >>> 11264.89user 311.88system 24:18.69elapsed 793%CPU (0avgtext+0avgdata 
> >>> 1265352maxresident)k
> >>
> >>  --param ggc-min-heapsize=262144
> >> 10778.52user 336.34system 23:15.71elapsed 796%CPU (0avgtext+0avgdata 
> >> 1277468maxresident)k 
> >>
> >>>  --param ggc-min-heapsize=393216
> >>> 10655.42user 347.92system 23:01.17elapsed 796%CPU (0avgtext+0avgdata 
> >>> 1280476maxresident)k
> >>>
> >>>  --param ggc-min-heapsize=524288
> >>> 10565.33user 352.90system 22:51.33elapsed 796%CPU (0avgtext+0avgdata 
> >>> 1506348maxresident)k
> > 
> > So 256MB gets 70% of the speed gain of 512MB, but for only 5% of the cost
> > in RSS.  384MB is an even better tradeoff for this testcase (but smaller
> > is safer).
> > 
> > Can the GC not tune itself better?  Or, not cost so much in the first
> > place ;-)
> > 
> > 
> > Segher
> > 
> 
> I think the idea of a fixed number is that it avoids the problem of bug
> reproducibility in the case of memory corruption.

Please note that you will get fixed numbers (defined in gcc/params.def)
for all non-release compiler configs.
For release builds the numbers already vary according to the host. They
get calculated in ggc-common.c.

-- 
Markus


Release criteria for Darwin

2017-04-10 Thread Simon Wright
I see that, in the GCC 7 Release Criteria, the Secondary Platforms list 
includes i686-apple-darwin.

Should this now be x86_64-apple-darwin? I've been building this since GCC 
4.5.0, Darwin 10, in 2011.

Re: Release criteria for Darwin

2017-04-10 Thread David Edelsohn
On Mon, Apr 10, 2017 at 10:58 AM, Simon Wright  wrote:
> I see that, in the GCC 7 Release Criteria, the Secondary Platforms list 
> includes i686-apple-darwin.
>
> Should this now be x86_64-apple-darwin? I've been building this since GCC 
> 4.5.0, Darwin 10, in 2011.

If the Darwin maintainers concur, this seems like an appropriate change.

Thanks, David


Re: Release criteria for Darwin

2017-04-10 Thread Mike Stump

> On Apr 10, 2017, at 8:17 AM, David Edelsohn  wrote:
> 
> On Mon, Apr 10, 2017 at 10:58 AM, Simon Wright  wrote:
>> I see that, in the GCC 7 Release Criteria, the Secondary Platforms list 
>> includes i686-apple-darwin.
>> 
>> Should this now be x86_64-apple-darwin? I've been building this since GCC 
>> 4.5.0, Darwin 10, in 2011.
> 
> If the Darwin maintainers concur, this seems like an appropriate change.

Yes.  It was safe to do that a long, long time ago.



lvx versus lxvd2x on power8

2017-04-10 Thread Igor Henrique Soares Nunes
Hi all,

I recently checked this old discussion about when/why to use lxvd2x instead of 
lvsl/lvx/vperm/lvx to load elements from memory to vector: 
https://gcc.gnu.org/ml/gcc/2015-03/msg00135.html

I had the same doubt and I was also concerned how performance influences on 
these approaches. So that, I created the following project to check which one 
is faster and how memory alignment can influence on results:

https://github.com/PPC64/load_vec_cmp

This is a simple code, that many loads (using both approaches) are executed in 
a simple loop in order to measure which implementation is slower. The project 
also considers alignment.

As it can be seen on this plot 
(https://raw.githubusercontent.com/igorsnunes/load_vec_cmp/master/doc/LoadVecCompare.png)
 an unaligned load using lxvd2x takes more time.

The previous discussion (as far as I could see) addresses that lxvd2x performs 
better than lvsl/lvx/vperm/lvx in all cases. Is that correct? Is my analysis 
wrong?

This issue concerned me, once lxvd2x is heavily used on compiled code.

Regards,

Igor


Re: g++ extension for Concepts TS

2017-04-10 Thread Nathan Ridge
cc Andrew Sutton

From: gcc-ow...@gcc.gnu.org  on behalf of Christopher Di 
Bella 
Sent: April 2, 2017 8:57 AM
To: gcc Mailing List
Subject: g++ extension for Concepts TS

Hey all,

I've been working on a concept extension that permits type aliases
inside the requires-seq.
The grammar addition is fairly simple.

```
requirement-seq
   requirement
   alias-declaration
   requirement-seq requirement
```

Semantically, this change forces a requirement-body to open a new
scope to house the alias.

I've managed to get it working for variable concepts, but not function concepts.

It looks like type aliases for some concepts are tricking the compiler
into thinking that there are multiple statements.
For example:

```cpp
template 
concept bool Foo =
requires(T a) {
   using type = T;
   using value_type = typename std::vector::value_type;
   {a + a} -> value_type;
   {a - a} -> type;
   {a + a} -> typename std::vector::value_type;
   {a - a} -> T;
};
```
works, but

```cpp
template 
concept bool Foo() {
requires(T a) {
   using type = T;
   using value_type = typename std::vector::value_type;
   {a + a} -> value_type;
   {a - a} -> type;
   {a + a} -> typename std::vector::value_type;
   {a - a} -> T;
};
}
```
fails with

```
test.cpp: In function 'concept bool Foo()':
test.cpp:4:14: error: definition of concept 'concept bool Foo()' has
multiple statements
 concept bool Foo() {
  ^~~
test.cpp: In function 'int main()':
test.cpp:17:10: error: deduced initializer does not satisfy
placeholder constraints
  Foo i = 0;
  ^
test.cpp:17:10: note: in the expansion of concept '(Foo)()'
template concept bool Foo() [with T = int]
```

After some inspection, I've deduced that the issue is flagged in
constraint.cc:2527, where a DECL_EXPR is identified, instead of a
RETURN_EXPR.
I'm wondering if it's trivially possible to ignore these declarations?
E.g. a loop that somewhat resembles:

```cpp
while (body != TREE_NULL && TREE_CODE(STATEMENT_LIST_HEAD(body)) ==
DECL_EXPR && is also an alias declaration)
   body = STATEMENT_LIST_TAIL(body);
if (body != TREE_NULL)
  error...
// else cleared of all charges
```

Cheers,

Chris