Re: GCC 4.8.4 Status Report (2014-12-05)

2014-12-17 Thread Dominique Dhumieres
Currently gcc 4.8.4 does not bootstrap on darwin14 (Yosemite) due to pr61407.
Would it be possible to apply the patch at 
https://gcc.gnu.org/bugzilla/attachment.cgi?id=33932
before 4.8.4 is released? Results with the patch are posted at
https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg02096.html.

TIA

Dominique


Re: GCC 4.8.4 Status Report (2014-12-05)

2014-12-17 Thread FX
> Currently gcc 4.8.4 does not bootstrap on darwin14 (Yosemite) due to pr61407.
> Would it be possible to apply the patch at 
> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33932
> before 4.8.4 is released? Results with the patch are posted at
> https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg02096.html.

If Mike thinks it’s a good idea, I’ll do it. I’ve tested 4.8 with it multiple 
times, and it works well.

FX

Re: trying out openacc 2.0

2014-12-17 Thread Tobias Burnus
Mark Farnell wrote:
> So what parameters will I need to pass to ./configure if I want to
> support PTX offloading?

Pre-remark: I think that the https://gcc.gnu.org/wiki/Offloading page will be
updated, once the support has been merged to the trunk.

I think using the triplet "nvptx-unknown-none" instead of
"x86_64-intelmicemul-linux-gnu":
https://gcc.gnu.org/wiki/Offloading#A1._Building_accel_compiler


> So if I want to have CPU, KNL and PTX, do I end up building three compilers?

That's my understanding. You then generate for the offloading/target section
code for the host (as fall back) and for one or multiple accelerators. At
invokation time, you then decide which accelerator to use (KNL, PTX or
host fallback.) Assuming that you target both accelerators during
compilations.


> Finally, is the nvptx-tools project mentioned in Tobia's page aiming
> at replacing the CUDA toolchain?

Depend's what you mean by "CUDA toolchain"; the purpose of those tools is
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg03372.html

Namely, "as" just does some reordering and "ld" reduces the number of files
by combining the PTX files. The actual 'assembler' is in the CUDA runtime
library.

However, GCC (plus the few aux tools) replaces the compilers (nvcc etc.)
as that task is done by GCC.


>> Also, are other GPUs such as the AMD ATI and the built-in GPUs such as
>> the Intel GPU and AMD fusion supported?

There was some work underway to support OpenACC with OpenCL as output,
which is then fed to the OpenCL runtime library. The OpenACC part of
that work ended up in gomp-4_0-branch and is hence not lost. I don't
recall whether there was a branch or patch for the OpenCL support part.

For AMD's HSA, see Jakub's email.

Tobias


Re: GCC 4.8.4 Status Report (2014-12-05)

2014-12-17 Thread Jakub Jelinek
On Wed, Dec 17, 2014 at 11:16:18AM +0100, Dominique Dhumieres wrote:
> Currently gcc 4.8.4 does not bootstrap on darwin14 (Yosemite) due to pr61407.

Why has it not been pushed in earlier?
I guess if you test it sufficiently and if Darwin maintainer acks it,
perhaps.  Note that the patch violates coding conventions by using
CamelCase minorDigitIdx.

Jakub


Re: [RFC] GCC vector extension: binary operators vs. differing signedness

2014-12-17 Thread Richard Biener
On Tue, Dec 16, 2014 at 7:42 PM, Ulrich Weigand  wrote:
> Richard Biener wrote:
>> On Fri, Dec 12, 2014 at 1:08 PM, Ulrich Weigand  wrote:
>> > Richard Biener wrote:
>> >> On Thu, Dec 11, 2014 at 4:04 PM, Ulrich Weigand  
>> >> wrote:
>> >> > However, if we make that change, there will be some cases that regress: 
>> >> > the
>> >> > problem is that an expression "x + y" has *one* result type, and some 
>> >> > things
>> >> > you do with the result will require that type to match precisely 
>> >> > (including
>> >> > signedness).  So *any* change that affects what that result type is will
>> >> > regress some code that happened to rely on the precise result type ...
>> >>
>> >> True, but IMHO that's still better.  You may want to check the openCL spec
>> >> which we tried to follow losely as to what we allow.
>> >>
>> >> So again, implementing your A is ok with me.
>> >
>> > Well, the openCL spec says that operations between signed and unsigned
>> > vectors are simply prohibited (both openCL 1.2 and openCL 2.0 agree on
>> > this, and it matches the behavior of my old openCL compiler ...):
> [snip]
>> The question is what the fallout is if we reject this by default (I suppose
>> we accept it with -flax-vector-conversions).  I'm ok with following
>> OpenCL as well,
>> that is either solution that makes behavior consistent between C and C++.
>
> I agree that we should certainly continue to support mixed types with
> -flax-vector-conversions.  (This means we really should fix the C/C++
> inconsistency as to return type in any case, even if we disable mixed
> type support by default.)
>
> What the fallout of disabling mixed types by default would be is hard
> to say.  On the one hand, all other standards I've looked at (OpenCL,
> Altivec/VSX, Cell SPU) prohibit mixing signed/unsigned types.  So this
> hopefully means users of the GCC extension don't use this feature (much).
> [Of course, Altivec does not really talk about operators, but ideally
> GCC's a + b should be equivalent to the Altivec vec_add (a, b), which
> does not support mixing signed/unsigned types.]
>
> On the other hand, I've noticed at least two related areas where disabling
> mixed types could result in unexpected fallout: opaque types and platform
> specific vector types (e.g. "vector bool" on Altivec).
>
> Opaque types are a somewhat under-specified GCC feature that is used for
> different purposes by various platforms and the middle-end itself:
> - some platform-specific types (PPC SPE __ev64_opaque__ or MEP cp_vector)
> - function parameters for overloaded builtins in C before resolution
> - the output of vector comparison operators and vector truth_type_for
>
> It should be possible to use an opaque type together with vectors of
> different types, even with -flax-vector-conversions (and even if we
> disable support for signed/unsigned mixed types); the result of an
> operation on an opaque type and a non-opaque type should be the
> non-opaque type; it's not quite clear to me how operations on two
> different opaque types are (should be?) defined.
>
> Platform-specific types like Altivec "vector bool" are not really known
> to the middle-end; this particular case is treated just like a normal
> unsigned integer vector type.  This means that as a side-effect of
> disabling signed/unsigned mixed types, we would also disallow mixing
> signed/bool types.  But at least for Altivec vec_add, the latter is
> explicitly *allowed* (returning the signed type).  It would certainly
> be preferable for + to be compatible to vec_add also for the bool types.
> [ Note that the current implementation also does not fully match that
> goal, because while signed + bool is allowed, the return value is
> sometimes the bool type instead of the signed type.  ]
>
> This can probably only be fully solved by making the middle-end aware
> that things like "vector bool" need to be handled specially.  I had
> thought that maybe the back-end simply needs to mark "vector bool"
> as an opaque type (after all, the middle-end also uses this for
> vector truth types), but that doesn't work as-is, since one of the
> other features of opaque types is that they cannot be initialized.
> (Altivec vector bool *can* be initialized.)  Maybe those two features
> should be decoupled, so we can have opaque types used as truth types,
> and those that cannot be initialized ...
>
>
> So overall the list of actions probably looks like this:
>
> 1. Fix the selection of the common type to use for binary vector operations
>- C and C++ need to be consistent

Right.

>- If one type is opaque and the other isn't, use the non-opaque type

Correct.

>- If one type is unsigned and the other is signed, use the unsigned type

Yes.

>- What to do with different types where neither of the above rules apply?

Reject them?

> 2. Rework support for opaque and platform-specific vector types
>- Instead of the single "opaque" bit, have two flags:
>  * Mark type as compatible with other

Extending -flto parallel feature to the rest of the build

2014-12-17 Thread Lewis Hyatt
Hello-

I recently started using -flto in my builds, it's a very impressive
feature, thanks very much for adding it. One thing that occurred to me
while switching over to using it: In an LTO world, the object files,
it seems to me, are becoming increasingly less relevant, at least for
some applications. Since you are already committing to the build
taking a long time, in return for the run-time performance benefit, it
makes sense in a lot of cases to go whole-hog and just compile
everything every time anyway. This comes with a lot of advantages,
besides fewer large files laying around, it simplifies things a lot,
say I don't need to worry about accidentally linking in an object file
compiled differently vs the rest (different -march, different
compiler, etc.), since I am just rebuilding from scratch every time.
In my use case, I do such things a lot, and find it very freeing to
know I don't need to worry about any state from a previous build.

In any case, the above was some justification for why I think the
following feature would be appreciated and used by others as well.
It's perhaps a little surprising, or at least disappointing, that
this:

g++ -flto=jobserver *.o

will be parallelized, but this:

g++ -flto=jobserver *.cpp

will effectively not be; each .cpp is compiled serially, then the LTO
runs in parallel, but in many cases the first step dominates the build
time. Now it's clear why things are done this way, if the user wants
to parallelize the compile, they are free to do so by just naming each
object as a separate target in their Makefile and running a parallel
make. But this takes some effort to set up, especially if you want to
take care to remove the intermediate .o files automatically, and since
-flto has already opened the door to gcc providing parallelization
features, it seems like it would be nice to enable parallelizing more
generally, for all parts of the build that could benefit from it.

I took a stab at implementing this. The below patch adds an option
-fparallel=(jobserver|N) that works analogously to -flto=, but applies
to the whole build. It generates a Makefile from each spec, with
appropriate dependencies, and then runs make to execute it. The
combination -fparallel=X -flto will also be parallelized on the lto
side as well, as if -flto=jobserver were specified; the idea would be
any downstream tool that could naturally offer parallel features would
do so in the presence of the -fparallel switch.

I am sure this must be very rough around the edges, it's my first-ever
look at the gcc codebase, but I tried not to make it overly
restrictive. I only really have experience with Linux and C++ so I may
have inadvertently specialized something to these cases, but I did try
to keep it general. Here is a list of potential issues that could be
addressed:

-For some jobs there are environment variables set on a per-job basis.
I attempted to identify all of them and came up with COMPILER_PATH,
LIBRARY_PATH, and COLLECT_GCC_OPTIONS. This would need to be kept up
to date if others are added.

-The mechanism I used to propagate environment variables (export +
unset) is probably specific to the Bourne shell and wouldn't work on
other platforms, but there would be some simple platform-specific code
to do it right for Windows and others.

-Similarly for -pipe mode, I put pipes into the Makefile recipe, so
there may be platforms where this is not the correct syntax.

Anyway, here it is, in case there is any interest to pursue it
further. Thanks for listening...

-Lewis

=

diff --git gcc/common.opt gcc/common.opt
index 3b8b14d..4417847 100644
--- gcc/common.opt
+++ gcc/common.opt
@@ -1575,6 +1575,10 @@ flto=
 Common RejectNegative Joined Var(flag_lto)
 Link-time optimization with number of parallel jobs or jobserver.

+fparallel=
+Common Driver RejectNegative Joined Var(flag_parallel)
+Enable parallel build with number of parallel jobs or jobserver.
+
 Enum
 Name(lto_partition_model) Type(enum lto_partition_model)
UnknownError(unknown LTO partitioning model %qs)

diff --git gcc/gcc.c gcc/gcc.c
index a5408a4..6f9c1cd 100644
--- gcc/gcc.c
+++ gcc/gcc.c
@@ -1716,6 +1716,73 @@ static int have_c = 0;
 /* Was the option -o passed.  */
 static int have_o = 0;

+/* Parallel mode  */
+static int parallel = 0;
+static int parallel_ctr = 0;
+static int parallel_sctr = 0;
+static enum {
+  parallel_mode_off,
+  parallel_mode_first_job_in_spec,
+  parallel_mode_continued_spec
+} parallel_mode = parallel_mode_off;
+static bool jobserver = false;
+static FILE* mstream = NULL;
+static const char* makefile = NULL;
+
+/* helper to turn $ -> $$ for make and
+   maybe escape single quotes for the shell. */
+static void
+mstream_escape_puts (const char* string, bool single_quote)
+{
+  if (single_quote)
+fputc ('\'', mstream);
+  for (; *string; string++)
+{
+  if (*string == '$')
+fputs ("$$", mstream);
+  else if (single_quote && *string == '\'')
+fputs ("\'\\\'\'", mstream

Re: trying out openacc 2.0

2014-12-17 Thread Mark Farnell
But it would be highly unlikely to only build the compiler for the
accelerator, 99% of the time you build the host and the accelerator.
So why can't we simplify the build process by allowing users to
specify the host architecture and list all the accelerators at
./configure  then the user only invoke the compiler once and build all
of them?

I guess the script can make a build directory for each host and
accelerator architecture, so that the object file of each architecture
will not mix.  This would make the build process much more
user-friendly



On Thu, Dec 18, 2014 at 12:10 AM, Tobias Burnus
 wrote:
> Mark Farnell wrote:
>> So what parameters will I need to pass to ./configure if I want to
>> support PTX offloading?
>
> Pre-remark: I think that the https://gcc.gnu.org/wiki/Offloading page will be
> updated, once the support has been merged to the trunk.
>
> I think using the triplet "nvptx-unknown-none" instead of
> "x86_64-intelmicemul-linux-gnu":
> https://gcc.gnu.org/wiki/Offloading#A1._Building_accel_compiler
>
>
>> So if I want to have CPU, KNL and PTX, do I end up building three compilers?
>
> That's my understanding. You then generate for the offloading/target section
> code for the host (as fall back) and for one or multiple accelerators. At
> invokation time, you then decide which accelerator to use (KNL, PTX or
> host fallback.) Assuming that you target both accelerators during
> compilations.
>
>
>> Finally, is the nvptx-tools project mentioned in Tobia's page aiming
>> at replacing the CUDA toolchain?
>
> Depend's what you mean by "CUDA toolchain"; the purpose of those tools is
> https://gcc.gnu.org/ml/gcc-patches/2014-10/msg03372.html
>
> Namely, "as" just does some reordering and "ld" reduces the number of files
> by combining the PTX files. The actual 'assembler' is in the CUDA runtime
> library.
>
> However, GCC (plus the few aux tools) replaces the compilers (nvcc etc.)
> as that task is done by GCC.
>
>
>>> Also, are other GPUs such as the AMD ATI and the built-in GPUs such as
>>> the Intel GPU and AMD fusion supported?
>
> There was some work underway to support OpenACC with OpenCL as output,
> which is then fed to the OpenCL runtime library. The OpenACC part of
> that work ended up in gomp-4_0-branch and is hence not lost. I don't
> recall whether there was a branch or patch for the OpenCL support part.
>
> For AMD's HSA, see Jakub's email.
>
> Tobias


Re: GCC 4.8.4 Status Report (2014-12-05)

2014-12-17 Thread Mike Stump
On Dec 17, 2014, at 2:21 AM, FX  wrote:
>> Currently gcc 4.8.4 does not bootstrap on darwin14 (Yosemite) due to pr61407.
>> Would it be possible to apply the patch at 
>> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33932
>> before 4.8.4 is released? Results with the patch are posted at
>> https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg02096.html.
> 
> If Mike thinks it’s a good idea, I’ll do it. I’ve tested 4.8 with it multiple 
> times, and it works well.

Ok.


Re: GCC 4.8.4 Status Report (2014-12-05)

2014-12-17 Thread Mike Stump
On Dec 17, 2014, at 3:44 AM, Jakub Jelinek  wrote:
> On Wed, Dec 17, 2014 at 11:16:18AM +0100, Dominique Dhumieres wrote:
>> Currently gcc 4.8.4 does not bootstrap on darwin14 (Yosemite) due to pr61407.
> 
> Why has it not been pushed in earlier?

No good reason.  No one checked it into the release branch.  Before the OS 
shipped, there might have been other things that needed changing, there 
weren’t.  I wanted it to bake on trunk first before hitting the release branch 
which is why when it went in originally, it didn’t also just go into the 
release branch.  This is the first time it has been pinged.

> I guess if you test it sufficiently and if Darwin maintainer acks it,

I’ve Oked it in my other email.

> Note that the patch violates coding conventions by using CamelCase 
> minorDigitIdx.

I’m fine with the obvious fix to trunk to resolve that.

gcc-4.9-20141217 is now available

2014-12-17 Thread gccadmin
Snapshot gcc-4.9-20141217 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20141217/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 218841

You'll find:

 gcc-4.9-20141217.tar.bz2 Complete GCC

  MD5=dd448c802ad133cfebcbf2b1017726da
  SHA1=3017e88c319166d88cbe7d421a8e12b9afedfe9b

Diffs from 4.9-20141210 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.