Hi!
On Mon, 20 Oct 2014 16:17:56 +0200, Bernd Schmidt
wrote:
> This is a patch kit that adds the nvptx port to gcc.
Committed to trunk in r220781:
commit 0f7695734890f93fe58179e36ac2f41bf4147d78
Author: tschwinge
Date: Wed Feb 18 08:01:03 2015 +
nvptx-none: Disable the lto-plugin.
Hi!
On Mon, 10 Nov 2014 17:19:57 +0100, Bernd Schmidt
wrote:
> I've now committed it, in the following form.
> --- /dev/null
> +++ b/gcc/config/nvptx/nvptx.h
> @@ -0,0 +1,356 @@
> +#define ASM_OUTPUT_ALIGN(FILE, POWER)
Committed to trunk in r218689:
commit 61f8a1bd770ded96fcff88f3cbc426a23c4
On 11/14/14 10:43, Jeff Law wrote:
On 11/14/14 04:09, Bernd Schmidt wrote:
Hi Jakub,
I have some questions about nvptx:
1) you've said that alloca isn't supported, but it seems
Yes, it's unimplemented. There's an internal declaration for it but that
seems to be as far as it goes, and that d
On 11/14/14 11:04, Jeff Law wrote:
On 11/14/14 05:36, Jakub Jelinek wrote:
So, for a warp, if some threads perform one branch of an if and other
threads another one, all threads perform the first one first (with some
maybe not doing anything), then all the threads the others (again, other
threa
On 11/14/14 05:36, Jakub Jelinek wrote:
So, for a warp, if some threads perform one branch of an if and other
threads another one, all threads perform the first one first (with some
maybe not doing anything), then all the threads the others (again, other
threads not doing anything)?
Nobody ever
On 11/14/14 04:39, Jakub Jelinek wrote:
:(. So what other option one has to implement something like TLS, even
using inline asm or similar? There is %tid, so perhaps indexing some array
with %tid? The trouble with that is that some thread can do
#pragma omp parallel again, and I bet the %tid
On Fri, Nov 14, 2014 at 08:37:52AM -0800, Cesar Philippidis wrote:
> On 11/14/2014 08:18 AM, Jakub Jelinek wrote:
>
> >> Also, keep in mind that PTX doesn't have a global TID. The user needs to
> >> calculate it using ctaid/tid and friends.
> >
> > Ok. Is %gridid needed for that combo too?
>
>
On 11/14/14 04:39, Jakub Jelinek wrote:
On Fri, Nov 14, 2014 at 12:09:03PM +0100, Bernd Schmidt wrote:
I have some questions about nvptx:
1) you've said that alloca isn't supported, but it seems
to be wired up and uses the %alloca documented in the PTX
manual, what is the issue with that
On 11/14/14 04:09, Bernd Schmidt wrote:
Hi Jakub,
I have some questions about nvptx:
1) you've said that alloca isn't supported, but it seems
to be wired up and uses the %alloca documented in the PTX
manual, what is the issue with that? %alloca not being actually
implemented by the
On 11/14/2014 08:18 AM, Jakub Jelinek wrote:
>> Also, keep in mind that PTX doesn't have a global TID. The user needs to
>> calculate it using ctaid/tid and friends.
>
> Ok. Is %gridid needed for that combo too?
Eventually, probably. Currently, we're launching all of our kernels with
cuLaunchKe
On Fri, Nov 14, 2014 at 07:37:49AM -0800, Cesar Philippidis wrote:
> > Hmm. It's worthwhile to keep in mind that GPU threads really behave
> > somewhat differently from CPUs (they don't really execute
> > independently); the OMP model may just be a poor match for the
> > architecture in general.
>
On 11/14/2014 04:12 AM, Bernd Schmidt wrote:
- we'll need some synchronization primitives, I see atomic
support is
there, we need mutexes and semaphores I think, is that
implementable
using bar instruction?
>>>
>>> It's probably membar you need.
>>
>> That
On 11/14/2014 01:36 PM, Jakub Jelinek wrote:
Any way to query those limits? Size of .shared memory, number of threads in
warp, number of warps, etc.?
I'd have to google most of that. There seems to be a WARP_SZ constant
available in ptx to get the size of the warp.
In OpenACC, are all work
On Fri, Nov 14, 2014 at 01:12:40PM +0100, Bernd Schmidt wrote:
> >:(. So what other option one has to implement something like TLS, even
> >using inline asm or similar? There is %tid, so perhaps indexing some array
> >with %tid?
>
> That ought to work. For performance you'd want that array in .s
I'm adding Thomas and Cesar to the Cc list, they may have more insight
into CUDA library questions as I haven't really looked into that part
all that much.
On 11/14/2014 12:39 PM, Jakub Jelinek wrote:
On Fri, Nov 14, 2014 at 12:09:03PM +0100, Bernd Schmidt wrote:
I have some questions about n
On Fri, Nov 14, 2014 at 12:09:03PM +0100, Bernd Schmidt wrote:
> >I have some questions about nvptx:
> >1) you've said that alloca isn't supported, but it seems
> >to be wired up and uses the %alloca documented in the PTX
> >manual, what is the issue with that? %alloca not being actually
>
On 11/14/2014 11:01 AM, Jakub Jelinek wrote:
On Fri, Nov 14, 2014 at 09:29:48AM +0100, Jakub Jelinek wrote:
I have some questions about nvptx:
Oh, and
5) I have noticed gcc doesn't generate the .uni suffixes anywhere,
while llvm generates them; are those appropriate only when a function
Hi Jakub,
I have some questions about nvptx:
1) you've said that alloca isn't supported, but it seems
to be wired up and uses the %alloca documented in the PTX
manual, what is the issue with that? %alloca not being actually
implemented by the current PTX assembler or translator?
Y
On Fri, Nov 14, 2014 at 09:29:48AM +0100, Jakub Jelinek wrote:
> I have some questions about nvptx:
Oh, and
5) I have noticed gcc doesn't generate the .uni suffixes anywhere,
while llvm generates them; are those appropriate only when a function
is guaranteed to be run unconditionally from th
On 11/12/14 05:34, Richard Biener wrote:
Now that this has been committed - I notice that there is no entry
in MAINTAINERS for the port. I propose Bernd.
Well, ahead of you there. I proposed Bernd to the steering committee
as the maintainer a little while ago. I need to go back and count v
On Mon, Oct 20, 2014 at 4:17 PM, Bernd Schmidt wrote:
> This is a patch kit that adds the nvptx port to gcc. It contains preliminary
> patches to add needed functionality, the target files, and one somewhat
> optional patch with additional target tools. There'll be more patch series,
> one for the
On Nov 10, 2014, at 12:37 PM, H.J. Lu wrote:
> I also checked in this patch to add missing braces in
> gcc.dg/pr44194-1.c.
Thanks.
On Mon, Nov 10, 2014 at 12:04 PM, Jakub Jelinek wrote:
> On Mon, Nov 10, 2014 at 05:19:57PM +0100, Bernd Schmidt wrote:
>> commit 659744a99d815b168716b4460e32f6a21593e494
>> Author: Bernd Schmidt
>> Date: Thu Nov 6 19:03:57 2014 +0100
>
> Note, in r217301 you've committed a change to pr35468.c,
On Mon, Nov 10, 2014 at 12:04 PM, Jakub Jelinek wrote:
> On Mon, Nov 10, 2014 at 05:19:57PM +0100, Bernd Schmidt wrote:
>> commit 659744a99d815b168716b4460e32f6a21593e494
>> Author: Bernd Schmidt
>> Date: Thu Nov 6 19:03:57 2014 +0100
>
> Note, in r217301 you've committed a change to pr35468.c,
On Mon, Nov 10, 2014 at 05:19:57PM +0100, Bernd Schmidt wrote:
> commit 659744a99d815b168716b4460e32f6a21593e494
> Author: Bernd Schmidt
> Date: Thu Nov 6 19:03:57 2014 +0100
Note, in r217301 you've committed a change to pr35468.c, not mentioned in
the ChangeLog, that uses no_const_addr_space e
On 10/30/2014 12:35 AM, Jeff Law wrote:
A "nit" -- Richard S. recently removed the need to include the "enum"
for "enum machine_mode". I believe he had a script to handle the
mundane parts of that change. Please make sure to update the nvptx port
to conform to that new convention, obviously fee
On 11/05/14 05:01, Bernd Schmidt wrote:
On 10/22/2014 08:11 PM, Jeff Law wrote:
I'm not going to insist you do this in the same way as the PA. That was
a different era -- we had significant motivation to make things work in
such a way that everything could be buried in the pa specific files.
Th
On 11/04/2014 05:51 PM, Bernd Schmidt wrote:
On 11/04/2014 05:48 PM, Richard Henderson wrote:
On 10/28/2014 03:56 PM, Bernd Schmidt wrote:
+nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote)
+{
+ switch (mode)
+{
+case BLKmode:
+ return ".b8";
+case BImode:
+
On 10/22/2014 08:11 PM, Jeff Law wrote:
I'm not going to insist you do this in the same way as the PA. That was
a different era -- we had significant motivation to make things work in
such a way that everything could be buried in the pa specific files.
That sometimes led to less than optimal app
On 11/04/2014 05:48 PM, Richard Henderson wrote:
On 10/28/2014 03:56 PM, Bernd Schmidt wrote:
+nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote)
+{
+ switch (mode)
+{
+case BLKmode:
+ return ".b8";
+case BImode:
+ return ".pred";
+case QImode:
+ if (
On 10/28/2014 03:56 PM, Bernd Schmidt wrote:
> +nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote)
> +{
> + switch (mode)
> +{
> +case BLKmode:
> + return ".b8";
> +case BImode:
> + return ".pred";
> +case QImode:
> + if (promote)
> + return ".u32";
On 11/04/2014 04:32 PM, Bernd Schmidt wrote:
> On 10/20/2014 04:19 PM, Bernd Schmidt wrote:
>> ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
>> defined. Add a sorry.
>
> Looking back through all the mails it turns out this one wasn't approved yet.
> Ping?
Ok.
r~
On 10/20/2014 04:19 PM, Bernd Schmidt wrote:
ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
defined. Add a sorry.
Looking back through all the mails it turns out this one wasn't approved
yet. Ping?
Bernd
On 10/31/14 17:50, Bernd Schmidt wrote:
On 10/31/2014 09:56 PM, Jeff Law wrote:
Pondering this a bit more, I think this is fine in concept. As you
note, removing the GNU extensions or at least making them conditional
would be good since these are going to be built with the host tools.
I'm not
On 10/31/2014 09:56 PM, Jeff Law wrote:
Pondering this a bit more, I think this is fine in concept. As you
note, removing the GNU extensions or at least making them conditional
would be good since these are going to be built with the host tools.
I'm not going to dig into the implementations...
On 10/20/14 08:48, Bernd Schmidt wrote:
This is a "bonus" optional patch which adds ar, ranlib, as and ld to the
ptx port. This is not proper binutils; ar and ranlib are just linked to
the host versions, and the other two tools have the following functions:
* nvptx-as is required to convert the
On 10/29/14 17:55, Bernd Schmidt wrote:
Thanks! I've pinged some of the preliminary patches that went unapproved
up to this point.
Thanks.
One leftover issue, discussed in the [0/11] mail - what amount of
documentation is appropriate for this, given that we don't want to
support using this a
On 10/30/2014 12:35 AM, Jeff Law wrote:
A "nit" -- Richard S. recently removed the need to include the "enum"
for "enum machine_mode". I believe he had a script to handle the
mundane parts of that change. Please make sure to update the nvptx port
to conform to that new convention, obviously fee
On 10/28/14 08:56, Bernd Schmidt wrote:
I have patches that expose all the address spaces to the middle-end
through a lower-as pass that runs early. The preliminary patches for
that ran into some resistance and into general brokenness of our address
space support, so I decided to rip all that ou
On 10/28/14 08:49, Bernd Schmidt wrote:
On 10/22/2014 08:12 PM, Jeff Law wrote:
Yea, let's keep your approach. Just wanted to explore a bit since the
PA seems to have a variety of similar characteristics.
Here's an updated version of the patch. I experimented a little with ptx
calling convent
On 10/22/2014 08:01 PM, Jeff Law wrote:
Please make sure all the functions in nvptx.c have function comments.
Done, and replaced regno 4 with NVPTX_RETURN_REGNUM.
+const char *
+nvptx_output_call_insn (rtx insn, rtx result, rtx callee)
If possible, promote first argument to rtx_insn *.
Als
On 10/22/2014 08:12 PM, Jeff Law wrote:
Yea, let's keep your approach. Just wanted to explore a bit since the
PA seems to have a variety of similar characteristics.
Here's an updated version of the patch. I experimented a little with ptx
calling conventions and ran into an arg that had to be
On 10/22/14 15:11, Bernd Schmidt wrote:
On 10/22/2014 10:31 PM, Jeff Law wrote:
These tools currently require GNU extensions - something I probably
ought to fix if we decide to add them to the gcc build itself.
Would these be more appropriate in binutils?
I don't think so, given that we don't
On 10/22/2014 10:31 PM, Jeff Law wrote:
These tools currently require GNU extensions - something I probably
ought to fix if we decide to add them to the gcc build itself.
Would these be more appropriate in binutils?
I don't think so, given that we don't need any piece of regular
binutils. The
On 10/20/14 08:48, Bernd Schmidt wrote:
This is a "bonus" optional patch which adds ar, ranlib, as and ld to the
ptx port. This is not proper binutils; ar and ranlib are just linked to
the host versions, and the other two tools have the following functions:
* nvptx-as is required to convert the
On 10/21/14 16:15, Bernd Schmidt wrote:
On 10/22/2014 12:05 AM, Jeff Law wrote:
On 10/20/14 14:30, Bernd Schmidt wrote:
ptx assembly requires that declarations are written for undefined
variables. This adds that functionality.
Does this need to happen at the use site, or can it be deferred?
On 10/21/14 16:06, Bernd Schmidt wrote:
On 10/21/2014 11:53 PM, Jeff Law wrote:
So, in the end I'm torn. I don't like adding new hooks when they're not
needed, but I have some reservations about relying on the order of stuff
in CALL_INSN_FUNCTION_USAGE and I worry a bit that you might end up w
On 10/20/14 08:33, Bernd Schmidt wrote:
These are the main target files for the ptx port. t-nvptx is empty for
now but will grow some content with follow up patches.
Bernd
010-target.diff
* configure.ac: Allow configuring lto for nvptx.
* configure: Regenerate.
gcc/
On Wed, Oct 22, 2014 at 12:02:16PM +0200, Richard Biener wrote:
> > I'm not sure that's what you're suggesting, but at least on non-shared
> > memory offloading devices, you can't switch arbitrarily between
> > offloading device(s) and host-fallback, for you have to do data
> > management between t
On Wed, Oct 22, 2014 at 10:34 AM, Thomas Schwinge
wrote:
> Hi!
>
> On Wed, 22 Oct 2014 10:18:49 +0200, Richard Biener
> wrote:
>> On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt
>> wrote:
>> > On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
>> >>
>> >> At least for OpenMP, the best would be if th
Hi!
On Wed, 22 Oct 2014 10:18:49 +0200, Richard Biener
wrote:
> On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt
> wrote:
> > On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
> >>
> >> At least for OpenMP, the best would be if the #pragma omp target regions
> >> and/or #pragma omp declare target fu
On Wed, Oct 22, 2014 at 10:18:49AM +0200, Richard Biener wrote:
> On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt
> wrote:
> > On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
> >>
> >> At least for OpenMP, the best would be if the #pragma omp target regions
> >> and/or #pragma omp declare target fun
On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt wrote:
> On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
>>
>> At least for OpenMP, the best would be if the #pragma omp target regions
>> and/or #pragma omp declare target functions contain anything a particular
>> offloading accelerator can't handle,
On 10/22/2014 12:05 AM, Jeff Law wrote:
On 10/20/14 14:30, Bernd Schmidt wrote:
ptx assembly requires that declarations are written for undefined
variables. This adds that functionality.
Does this need to happen at the use site, or can it be deferred?
This is independent of use sites. The pat
On 10/21/2014 11:53 PM, Jeff Law wrote:
So, in the end I'm torn. I don't like adding new hooks when they're not
needed, but I have some reservations about relying on the order of stuff
in CALL_INSN_FUNCTION_USAGE and I worry a bit that you might end up with
stuff other than arguments on that li
On 10/20/14 14:32, Bernd Schmidt wrote:
We skip the late compilation passes on ptx, but there's one piece we do
need - fixing up the function so that we get return insns in the right
places. This patch just makes thread_prologue_and_epilogue_insns
callable from the reorg pass.
Bernd
009-proep.
On 10/20/14 14:30, Bernd Schmidt wrote:
ptx assembly requires that declarations are written for undefined
variables. This adds that functionality.
Bernd
008-undefdecl.diff
gcc/
* target.def (assemble_undefined_decl): New hooks.
* hooks.c (hook_void_FILEptr_constcharp
On 10/21/14 21:29, Bernd Schmidt wrote:
A normal call looks like
{
.param.u32 %retval_in;
.param.u64 %out_arg0;
st.param.u64 [%out_arg0], %r1400;
call (%retval_in), PopCnt, (%out_arg0);
ld.param.u32%r1403, [%retval_in];
}
which declares local variables for the args and retur
On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
At least for OpenMP, the best would be if the #pragma omp target regions
and/or #pragma omp declare target functions contain anything a particular
offloading accelerator can't handle, instead of failing the whole
compilation perhaps just emit some at l
On 10/21/2014 11:11 PM, Jeff Law wrote:
On 10/20/14 14:29, Bernd Schmidt wrote:
In ptx assembly we need to decorate call insns with the arguments that
are being passed. We also need to know the exact function type. This is
kind of hard to do with the existing infrastructure since things like
fun
On Tue, Oct 21, 2014 at 11:00:35PM +0200, Bernd Schmidt wrote:
> On 10/21/2014 08:26 PM, Jeff Law wrote:
> >>* optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a
> >>sorry if necessary.
> >So doesn't this imply no hot-cold partitioning since we use indirect
> >jumps to get ac
On 10/20/14 14:29, Bernd Schmidt wrote:
In ptx assembly we need to decorate call insns with the arguments that
are being passed. We also need to know the exact function type. This is
kind of hard to do with the existing infrastructure since things like
function_arg are called at other times rathe
On 10/21/2014 08:26 PM, Jeff Law wrote:
* optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a
sorry if necessary.
So doesn't this imply no hot-cold partitioning since we use indirect
jumps to get across the partition? Similarly doesn't this imply other
missing features (se
On 10/20/14 14:26, Bernd Schmidt wrote:
On ptx, we'll be using pseudos to pass function args as well, and
there's one assert that needs to be toned town to make that work.
Bernd
006-usereg.diff
gcc/
* expr.c (use_reg_mode): Just return for pseudo registers.
OK.
I pondered
On 10/20/14 14:25, Bernd Schmidt wrote:
ptx assembly follows rather different rules than what's typical
elsewhere. We need a new hook to add a " };" string when we are finished
outputting a variable with an initializer.
Bernd
005-declend.diff
gcc/
* target.def (decl_end): Ne
On 10/20/14 14:24, Bernd Schmidt wrote:
This stops most of the post-regalloc passes to be run if the target
doesn't want register allocation. I'd previously moved them all out of
postreload to the toplevel, but Jakub (I think) pointed out that the
idea is not to run them to avoid crashes if reloa
On 10/20/14 14:22, Bernd Schmidt wrote:
Even when returning a structure by passing an invisible reference, gcc
still likes to set the return register to the address of the struct.
This is undesirable on ptx where things like the return register have to
be declared, and the function really returns
On 10/20/14 14:20, Bernd Schmidt wrote:
Since it's a virtual target, I've chosen not to run register allocation.
This is one of the patches necessary to make that work, it primarily
adds a target hook to disable it and fixes some of the fallout.
Bernd
002-noregalloc.diff
gcc/
On 10/20/14 14:19, Bernd Schmidt wrote:
ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
defined. Add a sorry.
Bernd
001-indjumps.diff
gcc/
* optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a
sorry if necessary.
So doesn't this im
On Tue, Oct 21, 2014 at 12:53 PM, Bernd Schmidt wrote:
> On 10/21/2014 10:18 AM, Richard Biener wrote:
>>
>> So with this restriction I wonder why it didn't make sense to go the
>> HSA "backend" route emitting PTX from a GIMPLE SSA pass. This
>> would have avoided the LTO dance as well ...
>
>
>
On 10/21/2014 10:42 AM, Jakub Jelinek wrote:
On Mon, Oct 20, 2014 at 04:17:56PM +0200, Bernd Schmidt wrote:
* Can't emit initializers referring to their variable's address since
you can't write forward declarations for variables.
Can't that be handled by emitting the initializer without
On 10/21/2014 10:18 AM, Richard Biener wrote:
So with this restriction I wonder why it didn't make sense to go the
HSA "backend" route emitting PTX from a GIMPLE SSA pass. This
would have avoided the LTO dance as well ...
Quite simple - there isn't an established way to do this. If I'd known
On Mon, Oct 20, 2014 at 04:17:56PM +0200, Bernd Schmidt wrote:
> * Can't emit initializers referring to their variable's address since
>you can't write forward declarations for variables.
Can't that be handled by emitting the initializer without the address and
some constructor that fixes up
On Mon, Oct 20, 2014 at 4:17 PM, Bernd Schmidt wrote:
> This is a patch kit that adds the nvptx port to gcc. It contains preliminary
> patches to add needed functionality, the target files, and one somewhat
> optional patch with additional target tools. There'll be more patch series,
> one for the
On Mon, 20 Oct 2014, Bernd Schmidt wrote:
> These tools currently require GNU extensions - something I probably ought to
> fix if we decide to add them to the gcc build itself.
And as regards library use, I'd expect the sources to start with #includes
of config.h and system.h (and so not include
Even when returning a structure by passing an invisible reference, gcc
still likes to set the return register to the address of the struct.
This is undesirable on ptx where things like the return register have to
be declared, and the function really returns void at ptx level. I've
added a targe
76 matches
Mail list logo