Re: Loop fusion.

2018-04-23 Thread Bin.Cheng
On Sun, Apr 22, 2018 at 3:27 PM, Toon Moene  wrote:
> A few days ago there was a rant on the Fortran Standardization Committee's
> e-mail list about Fortran's "whole array arithmetic" being unoptimizable.
>
> An example picked at random from our weather forecasting code:
>
> ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP)
> ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP)
> ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP)
> ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP)
>
> The reaction from one of the members of the committee (about "their"
> compiler):
>
> 'And multiple consecutive array statements with the same shape are “fused”
> exactly so that the compiler can generate good cache use. This sort of
> optimization is pretty low hanging fruit.'
>
> As far as I can see loop fusion as a stand-alone optimization is not
> supported as yet, although some mention is made in the context of graphite.
>
> Is this something that should be pursued ?
Hi,
I don't know the current status of fusion in graphite.  As for
traditional fusion transformation, I think it's not very difficult to
be implemented along with existing distribution, actually, quite lot
of code should be shared.  What we do need are something like: more
motivation cases, good/conservative cost model.

Thanks,
bin
>
> Kind regards,
>
> --
> Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
> Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: Loop fusion.

2018-04-23 Thread Richard Biener
On Sun, Apr 22, 2018 at 4:27 PM, Toon Moene  wrote:
> A few days ago there was a rant on the Fortran Standardization Committee's
> e-mail list about Fortran's "whole array arithmetic" being unoptimizable.
>
> An example picked at random from our weather forecasting code:
>
> ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP)
> ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP)
> ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP)
> ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP)
>
> The reaction from one of the members of the committee (about "their"
> compiler):
>
> 'And multiple consecutive array statements with the same shape are “fused”
> exactly so that the compiler can generate good cache use. This sort of
> optimization is pretty low hanging fruit.'
>
> As far as I can see loop fusion as a stand-alone optimization is not
> supported as yet, although some mention is made in the context of graphite.
>
> Is this something that should be pursued ?

In principle GRAPHITE can handle loop fusion but yes, standalone fusion
is sth useful.

Note that while it looks "obvious" in the above source fragment the IL
that is presented to optimizers may make things a lot less "low-hanging".

Richard.

> Kind regards,
>
> --
> Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
> Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: Loop fusion.

2018-04-23 Thread Richard Biener
On Mon, Apr 23, 2018 at 12:59 PM, Bin.Cheng  wrote:
> On Sun, Apr 22, 2018 at 3:27 PM, Toon Moene  wrote:
>> A few days ago there was a rant on the Fortran Standardization Committee's
>> e-mail list about Fortran's "whole array arithmetic" being unoptimizable.
>>
>> An example picked at random from our weather forecasting code:
>>
>> ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP)
>> ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP)
>> ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP)
>> ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP)
>>
>> The reaction from one of the members of the committee (about "their"
>> compiler):
>>
>> 'And multiple consecutive array statements with the same shape are “fused”
>> exactly so that the compiler can generate good cache use. This sort of
>> optimization is pretty low hanging fruit.'
>>
>> As far as I can see loop fusion as a stand-alone optimization is not
>> supported as yet, although some mention is made in the context of graphite.
>>
>> Is this something that should be pursued ?
> Hi,
> I don't know the current status of fusion in graphite.  As for
> traditional fusion transformation, I think it's not very difficult to
> be implemented along with existing distribution, actually, quite lot
> of code should be shared.  What we do need are something like: more
> motivation cases, good/conservative cost model.

Yes, I guess before distribution you want to do maximum fusion and then
apply (re-)distribution on the fused loop.  The cost model should be the
very same for distribution/fusion.

Richard.

> Thanks,
> bin
>>
>> Kind regards,
>>
>> --
>> Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
>> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
>> At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
>> Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: Loop fusion.

2018-04-23 Thread Janne Blomqvist
On Mon, Apr 23, 2018 at 2:02 PM, Richard Biener 
wrote:

> On Mon, Apr 23, 2018 at 12:59 PM, Bin.Cheng  wrote:
> > On Sun, Apr 22, 2018 at 3:27 PM, Toon Moene  wrote:
> >> A few days ago there was a rant on the Fortran Standardization
> Committee's
> >> e-mail list about Fortran's "whole array arithmetic" being
> unoptimizable.
> >>
> >> An example picked at random from our weather forecasting code:
> >>
> >> ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP)
> >> ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP)
> >> ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP)
> >> ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP)
> >>
> >> The reaction from one of the members of the committee (about "their"
> >> compiler):
> >>
> >> 'And multiple consecutive array statements with the same shape are
> “fused”
> >> exactly so that the compiler can generate good cache use. This sort of
> >> optimization is pretty low hanging fruit.'
> >>
> >> As far as I can see loop fusion as a stand-alone optimization is not
> >> supported as yet, although some mention is made in the context of
> graphite.
> >>
> >> Is this something that should be pursued ?
> > Hi,
> > I don't know the current status of fusion in graphite.  As for
> > traditional fusion transformation, I think it's not very difficult to
> > be implemented along with existing distribution, actually, quite lot
> > of code should be shared.  What we do need are something like: more
> > motivation cases, good/conservative cost model.
>
> Yes, I guess before distribution you want to do maximum fusion and then
> apply (re-)distribution on the fused loop.  The cost model should be the
> very same for distribution/fusion.
>
> Richard.
>


I recall Fujitsu bragging that the key to them getting good application
performance (read: outside linpack) on the K computer is extensive use of
loop FISSION + software pipelining. Though I guess sw-pipelining is only
useful if you have lots of architectural registers, which disqualifies
x86-64..


-- 
Janne Blomqvist


Re: Loop fusion.

2018-04-23 Thread Richard Biener
On Mon, Apr 23, 2018 at 2:31 PM, Janne Blomqvist
 wrote:
> On Mon, Apr 23, 2018 at 2:02 PM, Richard Biener 
> wrote:
>>
>> On Mon, Apr 23, 2018 at 12:59 PM, Bin.Cheng  wrote:
>> > On Sun, Apr 22, 2018 at 3:27 PM, Toon Moene  wrote:
>> >> A few days ago there was a rant on the Fortran Standardization
>> >> Committee's
>> >> e-mail list about Fortran's "whole array arithmetic" being
>> >> unoptimizable.
>> >>
>> >> An example picked at random from our weather forecasting code:
>> >>
>> >> ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP)
>> >> ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP)
>> >> ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP)
>> >> ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP)
>> >>
>> >> The reaction from one of the members of the committee (about "their"
>> >> compiler):
>> >>
>> >> 'And multiple consecutive array statements with the same shape are
>> >> “fused”
>> >> exactly so that the compiler can generate good cache use. This sort of
>> >> optimization is pretty low hanging fruit.'
>> >>
>> >> As far as I can see loop fusion as a stand-alone optimization is not
>> >> supported as yet, although some mention is made in the context of
>> >> graphite.
>> >>
>> >> Is this something that should be pursued ?
>> > Hi,
>> > I don't know the current status of fusion in graphite.  As for
>> > traditional fusion transformation, I think it's not very difficult to
>> > be implemented along with existing distribution, actually, quite lot
>> > of code should be shared.  What we do need are something like: more
>> > motivation cases, good/conservative cost model.
>>
>> Yes, I guess before distribution you want to do maximum fusion and then
>> apply (re-)distribution on the fused loop.  The cost model should be the
>> very same for distribution/fusion.
>>
>> Richard.
>
>
>
> I recall Fujitsu bragging that the key to them getting good application
> performance (read: outside linpack) on the K computer is extensive use of
> loop FISSION + software pipelining. Though I guess sw-pipelining is only
> useful if you have lots of architectural registers, which disqualifies
> x86-64..

FISSION we can do quite well (though we lack a cost model here), that's
what loop distribution does.

Richard.

>
> --
> Janne Blomqvist


bug ? : -Wpedantic -Wconversion 'short a=1; a-=1;' complaint

2018-04-23 Thread Jason Vas Dias

I really do not think a '-Wpedantic -Wconversion' warning should
be generated for the following code, but it is
(with GCC 6.4.1 and 7.3.1 on RHEL-7.5 Linux) :

 $ echo '
 typedef unsigned short U16_t;
 static void f(void)
 { U16_t a = 1;
   a-=1;
 }' > t.C;

 $ g++ -std=c++14 -Wall -Wextra -pedantic -Wc++11-compat \
   -Wconversion -Wcast-align -Wcast-qual -Wfloat-equal \
   -Wmissing-declarations -Wlogical-op -Wpacked -Wundef \
   -Wuninitialized -Wvariadic-macros -Wwrite-strings \
   -c t.C -o /dev/null
 t.C:4:8: conversion to 'U16_t' {aka short unsigned int} from 'int' may \
  alter its value.

I don't control the warning flags, as shown above,
that my code must compile against without warnings.

But I think the warning issued above is a GCC bug -
when I look at the code generated, (compile with -S -o t.S),
I see it actually does generate a 16-bit subtraction, which
is what I wanted:

.file   "t.C"
.text
.globl  _Z1fv
.type   _Z1fv, @function
_Z1fv:
.LFB0:
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
movw$1, -2(%rbp)
subw$1, -2(%rbp)
#    this is a two-byte word subtract
nop
popq%rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size   _Z1fv, .-_Z1fv
.ident  "GCC: (GNU) 6.4.1 20180321"
.section.note.GNU-stack,"",@progbits



So why is it generating a warning about conversion from 'int' ?

I don't see any conversion going on in assembler output -
(cast 'a' to a temporary int, do 32-bit integer subtraction of  '1'
 from it, and store sign-extended low 16 bits in 'a' - this is NOT
 what is going on here ) .

I'd like to remove either '-pedantic' or '-Wconversion' from the
warning flags, but this is not an option .

Please can GCC fix this warning bug eventually - I have to wade
through code that generates thousands of them per compilation.

Thanks & Best Regards,
Jason




Re: bug ? : -Wpedantic -Wconversion 'short a=1; a-=1;' complaint

2018-04-23 Thread Jonathan Wakely
On 23 April 2018 at 15:11, Jason Vas Dias wrote:
> Please can GCC fix this warning bug eventually - I have to wade
> through code that generates thousands of them per compilation.

gcc@gcc.gnu.org is for discussing development of GCC, not bugs.
gcc-b...@gcc.gnu.org is for automated emails generated from Bugzilla.
Cross-posting to those lists is never appropriate.

If you want to report a bug then please use Bugzilla, as per
https://gcc.gnu.org/bugs/

If you want to ask questions about using GCC use the gcc-h...@gcc.gnu.org list.

Bug reports to gcc@gcc.gnu.org don't get filed in Bugzilla and so
don't get fixed.


Re: style of code examples in changes.html

2018-04-23 Thread David Malcolm
On Mon, 2018-04-16 at 20:34 -0600, Martin Sebor wrote:
> Hi David & Gerald,

(sorry for the late response; I was offline on vacation last week)

> I noticed that the coding examples in the updates I committed
> to changes.html use a different formatting style than David's.
> I just copied mine from GCC 7 changes.html, and those I copied
> from David's for that version :)  

There are at least two kinds of example in the website:
(a) source code examples, and
(b) "screenshots" of gcc output, which can themselves contain code
output as part of a diagnostic.

I got sick of hand-converting (b) to our HTML tags, so I wrote a script
to do it, which I used for my gcc-8/changes.html.

The script is in the website's CVS repository as:
  bin/gcc-color-to-html.py
and can be run like this:

LANG=C \
  gcc $@ \
-fdiagnostics-color=always 2>&1 \
| ./bin/gcc-color-to-html.py

See
  https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00186.html

I also added a
  
  
around the output, though this isn't done by the above script.

I actually had a fair bit more scripting than this, based on the
scripting I did for my blogpost here:
  https://github.com/davidmalcolm/gcc-8-blogpost/blob/master/blog.html.in
where lines like:

INVOKE_GCC unclosed.c

in a foo.html.in get turned into a "screenshot" of the pertinent gcc
invocation in the foo.html.  But given that we don't want to require
running gcc itself to build the website (and indeed, specific gcc
versions), I just used this to generate the patch.

> Should we make an effort to
> make them all look the same?

Naturally, for (b), I favor the new style I used :)  (using the black
background, which may be enough to get the same look).

I'm not sure if we want to use it for (a).

> FWIW, I didn't notice the difference until my changes published.
> I'm guessing that's because the style sheet the page uses isn't
> referenced from the original document and the reference is only
> added by Gerald's script.  Is there a simple way to set things
> up so we can see our changes as they will appear when published?

I've been adding these lines to the  of the page:
  
  
while testing the content.

Hope this is helpful
Dave


Re: Loop fusion.

2018-04-23 Thread Toon Moene

On 04/23/2018 01:00 PM, Richard Biener wrote:


On Sun, Apr 22, 2018 at 4:27 PM, Toon Moene  wrote:

A few days ago there was a rant on the Fortran Standardization Committee's
e-mail list about Fortran's "whole array arithmetic" being unoptimizable.

An example picked at random from our weather forecasting code:

 ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP)
 ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP)
 ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP)
 ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP)

The reaction from one of the members of the committee (about "their"
compiler):

'And multiple consecutive array statements with the same shape are “fused”
exactly so that the compiler can generate good cache use. This sort of
optimization is pretty low hanging fruit.'

As far as I can see loop fusion as a stand-alone optimization is not
supported as yet, although some mention is made in the context of graphite.

Is this something that should be pursued ?


In principle GRAPHITE can handle loop fusion but yes, standalone fusion
is sth useful.

Note that while it looks "obvious" in the above source fragment the IL
that is presented to optimizers may make things a lot less "low-hanging".


Well, the loops are generated by the front end, so I *assume* they are 
basically the same ...


Probably the largest problem to address is the heuristic for preventing 
register pressure going through the roof ...


--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


r9 on ARM32?

2018-04-23 Thread Jay K
I'm wondering what is the role of r9 on ARM32, on Linux and Android.
  On Apple it is documented as long ago reserved, these days available for 
scratch.

I've looked around a bit but haven't gotten the full answer.

It is "the PIC register", I see.

What does that imply? Volatile? Von-volatile?

In particular I'm looking for a spare register, to pass an extra "special" 
parameter in, that can be considered volatile and never otherwise has a 
parameter.

Most ABIs have a few candidates, but arm32 comes up relatively short.

Intra procedural scratch (r12) probably cannot work for me.
I know gcc uses it for nested function context and that is laudable. I wish I 
could guarantee no code between me setting it and it being consumed.

And if it is volatile, I'd want the dynamic linker stubs to still preserve it 
incoming.

Thank you,
 - Jay


-g and -fvar-tracking

2018-04-23 Thread Sandra Loosemore
Can somebody remind me why using -g doesn't also enable -fvar-tracking 
by default?  At least for -g2, which is supposed to emit debug 
information about local variables?  It seems kind of counterintuitive to 
me that specifying a -O option enables a pass to collect better debug 
information but specifying -g to request debuggable code doesn't.  :-S


-Sandra