Re: Project Ranger

2018-05-30 Thread Eric Botcazou
> The Ranger is far enough along now that we have confidence in both its
> approach and ability to perform, and would like to solicit feedback on
> what you think of it,  any questions, possible uses,  as well as
> potential requirements to integrate with trunk later this stage.

The PDF document mentions that you first intended to support symbolic ranges 
but eventually dropped them as "too complex, and ultimately not necessary".

I don't entirely disagree with the former part, but I'm curious about the 
latter part: how do you intent to deal in the long term with cases that do 
require symbolic information to optimize things?  The TODO page seems to 
acknowledge the loophole but only mentions a plan to deal with equivalences, 
which is not sufficient in the general case (as acknowledged too on the page).

-- 
Eric Botcazou


Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-30 Thread Richard Biener
On Tue, May 29, 2018 at 5:24 PM Allan Sandfeld Jensen 
wrote:

> On Dienstag, 29. Mai 2018 16:57:56 CEST Richard Biener wrote:
> >
> > so the situation improves but isn't fully fixed (STLF issues maybe?)
> >

> That raises the question if it helps in these cases even in -O3?

That's a good question indeed.  We might end up (partly) BB vectorizing
loop bodies that we'd otherwise loop vectorize with SLP.  Benchmarking
with BB vectorization disabled at -O3+ isn't something I've done in the
past.  I'm now doing a 2-run with -march=haswell -Ofast
[-fno-tree-slp-vectorize]
for the FP benchmarks.

Note that there were some cases where disabling vectorization wholly
improved things.

> Anyway it doesn't look good for it. Did the binary size at least improve
with
> prefer-avx128, or was that also worse or insignificant?

Similar to the AVX258 results.  I guess where AVX256 applied we now simply
do two vector ops with AVX128.

Richard.


> 'Allan


Re: Project Ranger

2018-05-30 Thread Andrew MacLeod

On 05/30/2018 03:41 AM, Eric Botcazou wrote:

The Ranger is far enough along now that we have confidence in both its
approach and ability to perform, and would like to solicit feedback on
what you think of it,  any questions, possible uses,  as well as
potential requirements to integrate with trunk later this stage.

The PDF document mentions that you first intended to support symbolic ranges
but eventually dropped them as "too complex, and ultimately not necessary".

I don't entirely disagree with the former part, but I'm curious about the
latter part: how do you intent to deal in the long term with cases that do
require symbolic information to optimize things?  The TODO page seems to
acknowledge the loophole but only mentions a plan to deal with equivalences,
which is not sufficient in the general case (as acknowledged too on the page).

First, we'll collect the cases that demonstrate a unique situation we 
care about.  I have 4 very specific case that show current 
shortcomings.. Not just with the Ranger, but a couple we don't handle 
with VRP today. .. I'll eventually get those put onto the wiki so the 
list can be updated.


I think most of these cases that care about symbolics are not so much 
range related, but rather an algorithmic layer on top. Any follow on 
optimization to either enhance or replace vrp or anything similar will 
simply use the ranger as a client.  If it turns out there are cases 
where we *have* to remember the end point of a range as a symbolic, then 
the algorithm to track that symbolic along with the range, and request a 
re-evaluation of the range when the value of that symbolic is changes.


Thats the high-level view.  I'm not convinced the symbolic has to be in 
the range in order to solve problems for 2 reasons:


 1)  The Ranger maintains some definition chains internally and has a 
clear idea of what ssa_names can affect the outcome of a range. It 
tracks all these dependencies on a per-bb basis in the gori-map 
structure as imports and exports. .  The iterative approach I mentioned 
in the document would use this info to decide that ranges in a block 
need to be re-evaluated because an input to this block has changed.  
This is similar to the way VRP and other passes iterate until nothing 
changes.  If we get a better range for an ssa_name than we had before, 
push it on the stack to look for potential re-evaluation, and keep going.


ThIs is what I referred to as the Level 4 ranger in the document. I kind 
of glossed over it because I didn't want to get into the full-blown 
design I originally had, nor to rework it based on the current 
incarnation of the Ranger.  I wanted to primarily focus on what we 
currently have that is working so we can move forward with it.


I don't think we need to track the symbolic name in the range because 
the Ranger tracks these dependencies for each ssa-name and can indicate 
when we may need to reevaluate them.  There is an exported routine from 
the block ranger :

  tree single_import (tree name);
If there is a sole import, it will return the ssa-name that NAME is 
dependent on that can affect the range of  NAME.   We added that API so 
Aldy's new threading code could utilizes this ability to a small degree..


Bottom line: The ranger has information that a pass can use to decide 
that a range could benefit from being reevaluated. This identifies any 
symbolic component of a range from that block.


 2) This entire approach is modeled on walking the IL to evaluate a 
range.  If we put symbolics and expressions in the range, we are really 
duplicating information that is already in the IL, and have to make a 
choice of exactly what and how we do it..

BB3:
   j_1 = q_6 / 2
   i_2 = j_1 + 3
   if ( i_2 < k_4)

we could store the range of k_4 as [i_2 + 1, MAX]  (which seems the 
obvious one)

we could also store it as [j_1 + 4, MAX]
or even [q_6 / 2 + 4, MAX].  But we have to decide in advance, and we 
have extra work to do if it turns out to be one of the other names we 
ended up wanting..


At some point later on we decide we either don't know anything about i_2 
(or j_1, or q_6), or we have found a range for it, and now need to take 
that value and evaluate the expression stashed in the range in order to 
get the final result.   Note that whatever algorithm is doing this must 
also keep track of this range somehow in order to use it later.


With the Ranger model, the same algorithm gets a range, and if it thinks 
it might need to be re-evaluated for whatever reason can just track the 
an extra bit of info (like i_2 for instance) along side the range 
(rather than in it).  If we thinks the range needs to be re-evaluated , 
it can simply request a new range from the ranger.


You also don't have to decide whether to track the range with i_2 or j_1 
(or even q_6). The Ranger can tell you that the range it gives you for 
k_4 is accurate unless you get a new value for q_6. That is really what 
you want to track.  You might later wan

Re: Project Ranger

2018-05-30 Thread David Malcolm
On Tue, 2018-05-29 at 19:53 -0400, Andrew MacLeod wrote:

[...snip...]

> The code is located on an svn branch *ssa-range*.  It is based on
> trunk 
> at revision *259405***circa mid April 2018. 

Is this svn branch mirrored on gcc's git mirror?  I tried to clone it
from there, but failed.

[...snip...]

Thanks
Dave


Re: Project Ranger

2018-05-30 Thread Andrew MacLeod

On 05/30/2018 10:39 AM, David Malcolm wrote:

On Tue, 2018-05-29 at 19:53 -0400, Andrew MacLeod wrote:

[...snip...]


The code is located on an svn branch *ssa-range*.  It is based on
trunk
at revision *259405***circa mid April 2018.

Is this svn branch mirrored on gcc's git mirror?  I tried to clone it
from there, but failed.

[...snip...]

Thanks
Dave


I don't know :-)    I know that    svn diff -r 259405    worked and 
appeared to give me a diff of everything.   Aldy uses git, maybe he can 
tell you.


that was a merge from trunk to start with to an existing branch I had 
cut a month earlier.  . the ACTUAL original branch cut was revision 
*258524 if that makes any difference *


Andrew



Re: Project Ranger

2018-05-30 Thread Andreas Schwab
On Mai 30 2018, David Malcolm  wrote:

> On Tue, 2018-05-29 at 19:53 -0400, Andrew MacLeod wrote:
>
> [...snip...]
>
>> The code is located on an svn branch *ssa-range*.  It is based on
>> trunk 
>> at revision *259405***circa mid April 2018. 
>
> Is this svn branch mirrored on gcc's git mirror?  I tried to clone it
> from there, but failed.

It's in refs/remotes/ssa-range, which isn't fetched by default.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: Project Ranger

2018-05-30 Thread Aldy Hernandez




On 05/30/2018 10:51 AM, Andreas Schwab wrote:

On Mai 30 2018, David Malcolm  wrote:


On Tue, 2018-05-29 at 19:53 -0400, Andrew MacLeod wrote:

[...snip...]


The code is located on an svn branch *ssa-range*.  It is based on
trunk
at revision *259405***circa mid April 2018.


Is this svn branch mirrored on gcc's git mirror?  I tried to clone it
from there, but failed.


It's in refs/remotes/ssa-range, which isn't fetched by default.


Right.

From my tree:

$ git branch -a |grep svn.ssa.range
  remotes/svn/ssa-range



Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-30 Thread Richard Biener
On Wed, May 30, 2018 at 11:25 AM Richard Biener 
wrote:

> On Tue, May 29, 2018 at 5:24 PM Allan Sandfeld Jensen 
> wrote:

> > On Dienstag, 29. Mai 2018 16:57:56 CEST Richard Biener wrote:
> > >
> > > so the situation improves but isn't fully fixed (STLF issues maybe?)
> > >

> > That raises the question if it helps in these cases even in -O3?

> That's a good question indeed.  We might end up (partly) BB vectorizing
> loop bodies that we'd otherwise loop vectorize with SLP.  Benchmarking
> with BB vectorization disabled at -O3+ isn't something I've done in the
> past.  I'm now doing a 2-run with -march=haswell -Ofast
> [-fno-tree-slp-vectorize]
> for the FP benchmarks.

 Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  -
410.bwaves  13590178   76.4 *   13590180   75.5
S
410.bwaves  13590180   75.6 S   13590179   76.0
*
416.gamess  19580604   32.4 S   19580576   34.0
S
416.gamess  19580604   32.4 *   19580575   34.0
*
433.milc 9180339   27.1 *9180345   26.6
S
433.milc 9180343   26.7 S9180343   26.8
*
434.zeusmp   9100234   38.9 *9100234   38.9
S
434.zeusmp   9100234   38.8 S9100234   38.9
*
435.gromacs  7140251   28.5 *7140251   28.4
*
435.gromacs  7140252   28.3 S7140252   28.3
S
436.cactusADM   11950278   43.0 S   11950222   53.8
S
436.cactusADM   11950223   53.7 *   11950221   54.1
*
437.leslie3d 9400214   43.9 *9400215   43.6
*
437.leslie3d 9400217   43.3 S9400222   42.4
S
444.namd 8020302   26.5 S8020303   26.5
S
444.namd 8020302   26.6 *8020303   26.5
*
447.dealII  11440259   44.2 *   11440246   46.6
*
447.dealII  11440259   44.1 S   11440246   46.6
S
450.soplex   8340219   38.0 *8340219   38.0
*
450.soplex   8340221   37.7 S8340221   37.7
S
453.povray   5320108   49.2 *5320109   48.7
S
453.povray   5320108   49.1 S5320109   48.8
*
454.calculix 8250270   30.6 *8250269   30.6
*
454.calculix 8250271   30.5 S8250270   30.5
S
459.GemsFDTD10610308   34.5 S   10610306   34.7
S
459.GemsFDTD10610306   34.7 *   10610306   34.7
*
465.tonto9840428   23.0 S9840423   23.3
*
465.tonto9840426   23.1 *9840423   23.2
S
470.lbm 13740253   54.4 S   13740252   54.5
*
470.lbm 13740252   54.5 *   13740252   54.5
S
481.wrf 11170265   42.1 *   11170265   42.2
S
481.wrf 11170266   42.1 S   11170264   42.3
*
482.sphinx3 19490401   48.6 *   19490402   48.5
S
482.sphinx3 19490405   48.1 S   19490399   48.9
*

so we can indeed see similar detrimental effects on 416.gamess;  447.dealII
seems to improve with BB vectorization.

That means the 416.gamess slowdown is definitely worth investigating
since it reproduces with both AVX128 and AVX256 and with loop
vectorization.  I'll open a bug for it.

> Note that there were some cases where disabling vectorization wholly
> improved things.

> > Anyway it doesn't look good for it. Did the binary size at least improve
> with
> > prefer-avx128, or was that also worse or insignificant?

> Similar to the AVX258 results.  I guess where AVX256 applied we now simply
> do two vector ops with AVX128.

> Richard.


> > 'Allan


bootstrap failure due to declaration mismatch in r260956

2018-05-30 Thread Martin Sebor

Honza,

I think your r260956 is missing the following hunk:

Index: include/simple-object.h
===
--- include/simple-object.h (revision 260969)
+++ include/simple-object.h (working copy)
@@ -203,7 +203,7 @@ simple_object_release_write (simple_object_write *
 extern const char *
 simple_object_copy_lto_debug_sections (simple_object_read *src_object,
   const char *dest,
-  int *err);
+  int *err, int rename);

 #ifdef __cplusplus
 }


Martin


Re: bootstrap failure due to declaration mismatch in r260956

2018-05-30 Thread Gerald Pfeifer
On Wed, 30 May 2018, Martin Sebor wrote:
> I think your r260956 is missing the following hunk:

If this fixes the bootstrap for you (also ran into this myself
just now), can you please go ahead and commit?

We can always sort out things later, if there are details to be
tweaked, but fixing a bootstrap failure with a simple one-liner
like this, let's not get process-heavy and just do it. ;-)

Gerald


Re: bootstrap failure due to declaration mismatch in r260956

2018-05-30 Thread Martin Sebor

On 05/30/2018 12:27 PM, Gerald Pfeifer wrote:

On Wed, 30 May 2018, Martin Sebor wrote:

I think your r260956 is missing the following hunk:


If this fixes the bootstrap for you (also ran into this myself
just now), can you please go ahead and commit?

We can always sort out things later, if there are details to be
tweaked, but fixing a bootstrap failure with a simple one-liner
like this, let's not get process-heavy and just do it. ;-)


Jakub already committed the missing change in r260970 so boostrap
should be working again.

Thanks
Martin



Re: bootstrap failure due to declaration mismatch in r260956

2018-05-30 Thread Jan Hubicka
> On 05/30/2018 12:27 PM, Gerald Pfeifer wrote:
> >On Wed, 30 May 2018, Martin Sebor wrote:
> >>I think your r260956 is missing the following hunk:
> >
> >If this fixes the bootstrap for you (also ran into this myself
> >just now), can you please go ahead and commit?
> >
> >We can always sort out things later, if there are details to be
> >tweaked, but fixing a bootstrap failure with a simple one-liner
> >like this, let's not get process-heavy and just do it. ;-)
> 
> Jakub already committed the missing change in r260970 so boostrap
> should be working again.

I apologize for that. I left svn commit waiting for commit
log entry and did not notice that :(

Thanks for fixing it!
Honza
> 
> Thanks
> Martin
> 


gcc-6-20180530 is now available

2018-05-30 Thread gccadmin
Snapshot gcc-6-20180530 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/6-20180530/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-6-branch 
revision 260976

You'll find:

 gcc-6-20180530.tar.xzComplete GCC

  SHA256=34b2ec9f1c047cde51d35c7aea31952f0f40006dd64df78943edfffc6294d9ff
  SHA1=9ab3ed7e4e237611dcfabf42d4e46bbb9fb5a7a4

Diffs from 6-20180523 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.