GCC 6.5 Status Report (2018-10-12)

2018-10-12 Thread Jakub Jelinek
Status
==

It is now time to release GCC 6.5 and close the 6.x branch.
If you have regression bugfixes or documentation fixes that should be
still backported to the branch, please test them and check them in
before Friday, October 19th, when I'd like to create a Release Candidate
of 6.5.


Quality Data


Priority  #   Change from last report
---   ---
P10
P2  210   +  35
P3   28   +  20
P4  130   -   4
P5   28   -   1
---   ---
Total P1-P3 238   +  55
Total   396   +  50


Previous Report
===

https://gcc.gnu.org/ml/gcc/2017-07/msg5.html


Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-12 Thread Thomas Schwinge
Hi!

I'm for the first time looking into the existing vectorization
functionality in GCC (yay!), and with that I'm also for the first time
encountering GCC's scalar evolution (scev) machinery (yay!), and the
chains of recurrences (chrec) used by that (yay!).

Obviously, I'm right now doing my own reading and experimenting, but
maybe somebody can cut that short, if my current question doesn't make
much sense, and is thus easily answered:

int a[NJ][NI];

#pragma acc loop collapse(2)
for (int j = 0; j < N_J; ++j)
  for (int i = 0; i < N_I; ++i)
a[j][i] = 0;

Without "-fopenacc" (thus the pragma ignored), this does vectorize (for
the x86_64 target, for example, without OpenACC code offloading), and
also does it vectorize with "-fopenacc" enabled but the "collapse(2)"
clause removed and instead another "#pragma acc loop" added in front of
the inner "i" loop.  But with the "collapse(2)" clause in effect, these
two nested loops get, well, "collapse"d by omp-expand into one:

for (int tmp = 0; tmp < N_J * N_I; ++tmp)
  {
int j = tmp / N_I;
int i = tmp % N_I;
a[j][i] = 0;
  }

This does not vectorize because of scalar evolution running into
unhandled (chrec_dont_know) TRUNC_DIV_EXPR and TRUNC_MOD_EXPR in
gcc/tree-scalar-evolution.c:interpret_rhs_expression.  Do I have a chance
in teaching it to handle these, without big effort?


If that's not reasonable, I shall look for other options to address the
problem that currently vectorization gets pessimized by "-fopenacc" and
in particular the code rewriting for the "collapse" clause.


By the way, the problem can, similarly, also be displayed in an OpenMP
example, where also when such a "collapse" clause is present, the inner
loop's code no longer vectorizes.  (But I've not considered that case in
any more detail; Jakub CCed in case that's something to look into?  I
don't know how OpenMP threads' loop iterations are meant to interact with
OpenMP SIMD, basically.)


Hmm, and without any OpenACC/OpenMP etc., actually the same problem is
also present when running the following code through the vectorizer:

for (int tmp = 0; tmp < N_J * N_I; ++tmp)
  {
int j = tmp / N_I;
int i = tmp % N_I;
a[j][i] = 0;
  }

... whereas the following variant (obviously) does vectorize:

int a[NJ * NI];

for (int tmp = 0; tmp < N_J * N_I; ++tmp)
  a[tmp] = 0;

Hmm.  Linearization.  From a quick search, I found some 2010 work by
Sebastian Pop on that topic, in the Graphite context
(gcc/graphite-flattening.c), but that got pulled out again in 2012.
(I have not yet looked up the history, and have not yet looked whether
that'd be relevant here at all -- and we're not using Graphite here.)

Regarding that, am I missing something obvious?


Grüße
 Thomas


Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-12 Thread Marc Glisse

On Fri, 12 Oct 2018, Thomas Schwinge wrote:


Hmm, and without any OpenACC/OpenMP etc., actually the same problem is
also present when running the following code through the vectorizer:

   for (int tmp = 0; tmp < N_J * N_I; ++tmp)
 {
   int j = tmp / N_I;
   int i = tmp % N_I;
   a[j][i] = 0;
 }

... whereas the following variant (obviously) does vectorize:

   int a[NJ * NI];

   for (int tmp = 0; tmp < N_J * N_I; ++tmp)
 a[tmp] = 0;


I had a quick look at the difference, and a[j][i] remains in this form 
throughout optimization. If I write instead *((*(a+j))+i) = 0; I get


  j_10 = tmp_17 / 1025;
  i_11 = tmp_17 % 1025;
  _1 = (long unsigned int) j_10;
  _2 = _1 * 1025;
  _3 = (sizetype) i_11;
  _4 = _2 + _3;

or for a power of 2

  j_10 = tmp_17 >> 10;
  i_11 = tmp_17 & 1023;
  _1 = (long unsigned int) j_10;
  _2 = _1 * 1024;
  _3 = (sizetype) i_11;
  _4 = _2 + _3;

and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least 
I think that's true).


So there are missing match.pd transformations in addition to whatever 
scev/ivdep/other work is needed.


--
Marc Glisse


Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-12 Thread Jakub Jelinek
On Fri, Oct 12, 2018 at 07:35:09PM +0200, Thomas Schwinge wrote:
> int a[NJ][NI];
> 
> #pragma acc loop collapse(2)
> for (int j = 0; j < N_J; ++j)
>   for (int i = 0; i < N_I; ++i)
> a[j][i] = 0;

For e.g.
int a[128][128];

void
foo (int m, int n)
{
  #pragma omp for simd collapse(2)
  for (int i = 0; i < m; i++)
for (int j = 0; j < n; j++)
  a[i][j]++;
}
we emit in the inner loop:
   :
  i = i.0;
  j = j.1;
  _1 = a[i][j];
  _2 = _1 + 1;
  a[i][j] = _2;
  .iter.4 = .iter.4 + 1;
  j.1 = j.1 + 1;
  D.2912 = j.1 < n.7 ? 0 : 1;
  i.0 = D.2912 + i.0;
  j.1 = j.1 < n.7 ? j.1 : 0;
  
   :
  if (.iter.4 < D.2902)
goto ; [87.50%]
  else
goto ; [12.50%]
to make it more vectorization friendly (though, in this particular case it
isn't vectorized either) and not do the expensive % and / operations inside
of the inner loop.  Without -fopenmp it does vectorize only the inner loop,
there is no collapse.

Jakub


RFC: allowing compound assignment operators with designated initializers

2018-10-12 Thread Rasmus Villemoes
This is something I've sometimes found myself wishing was supported. The
idea being that one can say

unsigned a[] = { [0] = 1, [1] = 3, [0] |= 4, ...}

which would end up initializing a[0] to 5. As a somewhat realistic
example, suppose one is trying to build a bitmap at compile time, but
the bits to set are not really known in the sense that one can group
those belonging to each index in a usual | expression. Something like

#define _(e) [e / 8] |= 1 << (e % 8)
const u8 error_bitmap[] = { _(EINVAL), _(ENAMETOOLONG), _(EBUSY), ... }

Writing a small program to generate such a table as part of the build is
not practical in a cross-compile setting (because the constants may only
really be known to the cross-compiler, e.g. the errno values above).

I think the rules would be rather intuitive: If a compound assignment is
used for an element that doesn't have a previous ordinary initializer,
the LHS is 0. Any later ordinary initializer wipes all previous
operations. No operator precedence; the new value is computed
immediately and used as LHS in subsequent operations.

I'm not sure how to define what happens in unions, but I also don't even
know what the current rules are in a case like

union u { char c; int i; } = { .i = 0x11223344, .c = 0x55 }

where one initializes a smaller member after a larger.

Another issue is how to handle side effects in the RHS. It's probably
consistent with the current behaviour to discard all side effects prior
to the last ordinary initializer, and to do the side effects from all
the expressions that did end up affecting the final value. (Btw., the
current documentation doesn't talk about how this interacts with range
initializers, e.g. [0...5] = x++, [2...6] = y++, [0...4] = z++, does x++
happen?) But for automatic variables, one might as well do the compound
operations in code after the declaration, so it would be fine just
allowing this extension for static initialization.

Rasmus


gcc-8-20181012 is now available

2018-10-12 Thread gccadmin
Snapshot gcc-8-20181012 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/8-20181012/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 8 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-8-branch 
revision 265135

You'll find:

 gcc-8-20181012.tar.xzComplete GCC

  SHA256=16141cc7ffcc0a767d3328043493ab3419c256321d4beae02114ac7c50ae2071
  SHA1=008bf99a0c662b7b556df7f73de88543a4abcdce

Diffs from 8-20181005 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


About GSOC.

2018-10-12 Thread Tejas Joshi
Hello.
I reached asking about GCC GSoC project about adding and
folding functions
like roundeven. I could not apply for the idea this year but
interested in the peoject and
really hoping it would be carry forwarded. Since I've been studying
source code and about the project, I think working on this from now
would give me some heads up and hands on with the source code.

I did study .
It does tell that roundeven rounds its argument to nearest integral
ties to even (least significant bit 0) returning integral value
provided that the resulting value is exact.
So, for the start, I'd be implementing this functionality for roundeven.
As ita said in earlier mails that, similar functions like
real_ceil are implemented
in real.c and are used in fold-const-call.c.
Roundeven might be implemented in similar way. Is it built-in
(internal) function means not to be exposed to end-user?
Studying some functions like real_ceil, there are call checks
(flag_errno_math) so I believe similar would be needed for roundeven.

In real.c where real_ceil is implemented, there are function calls
(and implementations) like do_fix_trunc which also then call functions
like decimal_do_dix_trunc (maybe the main functionality of
do_fix_trunc?, other are just checks, like NaN or qNaN). I did not
understand these functions really and what do they do. Also I did not
understand the structure of REAL_VALUE_TYPE (r->cl and etc?)

Also when does the real.c and fold-const-call.c comes in picture in
the flow of GCC (Is it for GIMPLE level instruction selection (gimple
stmnt to corresponding rtl instruction))?
Thanks.

Regards,
-Tejas