On 11/22/2013 12:50 PM, Richard Sandiford wrote:
> Bernd Schmidt writes:
>> On 11/22/2013 09:15 AM, Richard Sandiford wrote:
>>> +extern int push_multiple_operation_p (rtx);
>>> +extern int pop_multiple_operation_p (rtx);
>>
>> I was once told that function
On 11/16/2013 11:01 AM, Chung-Lin Tang wrote:
> My response to the various issues you raised are below. The new patch
> has been re-tested. Please see if you can approve for committing now.
I agree with all the comments Richard has been making, so I'll just add
a few other points.
> If you don't
On 11/23/2013 08:20 PM, Mike Stump wrote:
> Richi has asked the we break the wide-int patch so that the individual port
> and front end maintainers can review their parts without have to go through
> the entire patch.This patch covers the bfin port.
>
> Ok?
I haven't seen any updates on the
On 10/03/2014 04:31 PM, Andrey Turetskiy wrote:
I've applied your option patch on our offload branch (w/o
'-ftarget-options' switch yet) and it seems to be working fine.
However the patch looks a bit unfinished:
@@ -440,7 +554,11 @@ access_check (const char *name, int mode
static char*
pre
On 10/09/2014 02:07 PM, Ilya Verbin wrote:
+#ifndef ACCEL_COMPILER
/* We need to check standard_exec_prefix/just_machine_suffix/specs
for any override of as, ld and libraries. */
specs_file = (char *) alloca (strlen (standard_exec_prefix)
+ strlen (just_mach
On 10/13/2014 12:33 PM, Ilya Verbin wrote:
On 13 Oct 12:19, Jakub Jelinek wrote:
But I'd like to understand why is this one needed.
Why should the compilers care? Aggregates layout and alignment of
integral/floating types must match between host and offload compilers, sure,
but isn't that somet
On 10/14/2014 09:25 AM, Richard Biener wrote:
On Mon, 13 Oct 2014, Bernd Schmidt wrote:
On 10/13/2014 12:33 PM, Ilya Verbin wrote:
On 13 Oct 12:19, Jakub Jelinek wrote:
But I'd like to understand why is this one needed.
Why should the compilers care? Aggregates layout and alignme
This is a patch kit that adds the nvptx port to gcc. It contains
preliminary patches to add needed functionality, the target files, and
one somewhat optional patch with additional target tools. There'll be
more patch series, one for the testsuite, and one to make the offload
functionality work
Since it's a virtual target, I've chosen not to run register allocation.
This is one of the patches necessary to make that work, it primarily
adds a target hook to disable it and fixes some of the fallout.
Bernd
ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
defined. Add a sorry.
Bernd
gcc/
* optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a
sorry if necessary.
Index: gcc/optabs.c
===
Since it's a virtual target, I've chosen not to run register allocation.
This is one of the patches necessary to make that work, it primarily
adds a target hook to disable it and fixes some of the fallout.
Bernd
gcc/
* target.def (no_register_allocation): New data hook.
* doc/tm.texi.in: A
Even when returning a structure by passing an invisible reference, gcc
still likes to set the return register to the address of the struct.
This is undesirable on ptx where things like the return register have to
be declared, and the function really returns void at ptx level. I've
added a targe
This stops most of the post-regalloc passes to be run if the target
doesn't want register allocation. I'd previously moved them all out of
postreload to the toplevel, but Jakub (I think) pointed out that the
idea is not to run them to avoid crashes if reload fails e.g. for an
invalid asm. So I'
ptx assembly follows rather different rules than what's typical
elsewhere. We need a new hook to add a " };" string when we are finished
outputting a variable with an initializer.
Bernd
gcc/
* target.def (decl_end): New hook.
* varasm.c (assemble_variable_contents, assemble_constant_conten
On ptx, we'll be using pseudos to pass function args as well, and
there's one assert that needs to be toned town to make that work.
Bernd
gcc/
* expr.c (use_reg_mode): Just return for pseudo registers.
Index: gcc/expr.
In ptx assembly we need to decorate call insns with the arguments that
are being passed. We also need to know the exact function type. This is
kind of hard to do with the existing infrastructure since things like
function_arg are called at other times rather than just when emitting a
call, so t
ptx assembly requires that declarations are written for undefined
variables. This adds that functionality.
Bernd
gcc/
* target.def (assemble_undefined_decl): New hooks.
* hooks.c (hook_void_FILEptr_constcharptr_const_tree): New function.
* hooks.h (hook_void_FILEptr_constcharptr_const_tree
We skip the late compilation passes on ptx, but there's one piece we do
need - fixing up the function so that we get return insns in the right
places. This patch just makes thread_prologue_and_epilogue_insns
callable from the reorg pass.
Bernd
gcc/
* function.c (thread_prologue_and_epilogue
@@
+/* NVPTX common hooks.
+ Copyright (C) 2014 Free Software Foundation, Inc.
+ Contributed by Bernd Schmidt
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software
tm_file="${tm_file} newlib-stdint.h"
Index: git/gcc/config/nvptx/nvptx-as.c
===
--- /dev/null
+++ git/gcc/config/nvptx/nvptx-as.c
@@ -0,0 +1,961 @@
+/* An "assembler" for ptx.
+ Copyright (C) 2014 Free Software Foundation, Inc.
+ Contributed
On 10/21/2014 10:18 AM, Richard Biener wrote:
So with this restriction I wonder why it didn't make sense to go the
HSA "backend" route emitting PTX from a GIMPLE SSA pass. This
would have avoided the LTO dance as well ...
Quite simple - there isn't an established way to do this. If I'd known
On 10/21/2014 10:42 AM, Jakub Jelinek wrote:
On Mon, Oct 20, 2014 at 04:17:56PM +0200, Bernd Schmidt wrote:
* Can't emit initializers referring to their variable's address since
you can't write forward declarations for variables.
Can't that be handled by emitting the
This series modifies a large number of tests in order to clean up
testsuite results on nvptx. The goal here was never really to get an
entirely clean run - the target is just too different from conventional
ones - but to be able to test the compiler sufficiently to be sure that
it's in good sha
This deals with uses of alloca in the testsuite. Some tests require it
outright, others only at -O0, and others require it implicitly by
requiring an alignment for stack variables bigger than the target's
STACK_BOUNDARY. For the latter I've added explicit xfails.
Bernd
gcc/testsuite/
* lib
Since everything in ptx assembly is typed, K&R C is problematic. There
are a number of testcases that call functions with the wrong number of
arguments, or arguments of the wrong type. I've added a new feature,
untyped_assembly, which these tests now require. I've also used this for
tests using
Some tests use stdio functions which are unavaiable with the cut-down
newlib I'm using for ptx testing. I'm somewhat uncertain what to do with
these; they are by no means the only unavailable library functions the
testsuite tries to use (signal is another example). Here's a patch which
deals wi
Some things don't fit into nice categories that apply to a larger set of
tests, or which are somewhat random like ptxas tool failures. For these
I've added xfails and skips.
Bernd
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_trampolines,
check_profiling_available, check
This deals with tests requiring indirect jumps (including tests using
setjmp), label values, and nonlocal goto.
A subset of these tests uses the NO_LABEL_VALUES macro, but it's not
consistent across the testsuite. The feature test I wrote tests whether
that is defined and returns false for lab
This tweaks a few tests so that we don't have to skip them. This is
mostly concerned with declaring main properly, or changing other
declarations where the test does not seem to rely on the type mismatches.
I've also included one example of changing a function name to not be
"call", ptxas see
This tests for availability of return addresses in a number of tests.
Bernd
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_return_address):
New function.
* gcc.c-torture/execute/20010122-1.c: Require return_address.
* gcc.c-torture/execute/20030323-1.c: Likewise.
* gcc.c
On 10/21/2014 05:27 PM, Jeff Law wrote:
More ptx tooling failures than I'd expect. I'll leave it up to you
whether or not to push on NVidia to fix some of those failures. The
timeouts seem particularly troublesome.
All I can say is that we've reported them.
Bernd
On 10/21/2014 05:16 PM, Jeff Law wrote:
On 10/21/14 14:15, Bernd Schmidt wrote:
Since everything in ptx assembly is typed, K&R C is problematic. There
are a number of testcases that call functions with the wrong number of
arguments, or arguments of the wrong type. I've added a ne
On 10/21/2014 08:26 PM, Jeff Law wrote:
* optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a
sorry if necessary.
So doesn't this imply no hot-cold partitioning since we use indirect
jumps to get across the partition? Similarly doesn't this imply other
missing features (se
On 10/21/2014 09:01 PM, Mike Stump wrote:
On Oct 21, 2014, at 7:17 AM, Bernd Schmidt
wrote:
Some tests use stdio functions which are unavaiable with the
cut-down newlib I'm using for ptx testing. I'm somewhat uncertain
what to do with these; they are by no means the only unavailab
On 10/21/2014 11:11 PM, Jeff Law wrote:
On 10/20/14 14:29, Bernd Schmidt wrote:
In ptx assembly we need to decorate call insns with the arguments that
are being passed. We also need to know the exact function type. This is
kind of hard to do with the existing infrastructure since things like
On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
At least for OpenMP, the best would be if the #pragma omp target regions
and/or #pragma omp declare target functions contain anything a particular
offloading accelerator can't handle, instead of failing the whole
compilation perhaps just emit some at l
This is a followup patch for the nvptx port. Since malloc and free are
magically provided by the ptx environment, but realloc is missing, it's
nontrivial to provide an implementation for it. The Fortran frontend
likes to generate calls to realloc, but in one case it seems like we can
compute th
On 10/21/2014 11:53 PM, Jeff Law wrote:
So, in the end I'm torn. I don't like adding new hooks when they're not
needed, but I have some reservations about relying on the order of stuff
in CALL_INSN_FUNCTION_USAGE and I worry a bit that you might end up with
stuff other than arguments on that li
On 10/22/2014 12:05 AM, Jeff Law wrote:
On 10/20/14 14:30, Bernd Schmidt wrote:
ptx assembly requires that declarations are written for undefined
variables. This adds that functionality.
Does this need to happen at the use site, or can it be deferred?
This is independent of use sites. The
On 10/22/2014 10:31 PM, Jeff Law wrote:
These tools currently require GNU extensions - something I probably
ought to fix if we decide to add them to the gcc build itself.
Would these be more appropriate in binutils?
I don't think so, given that we don't need any piece of regular
binutils. The
On 10/15/2014 03:52 PM, Richard Biener wrote:
I'd say that we eventually should have a type flag that says
"this is a va-list type". If we really need to know that - because
I don't understand why we need to do this - the context should
tell us exactly whether we deal with a va_list object or n
On 05/27/2014 04:01 PM, Tobias Burnus wrote:
Bernd Schmidt wrote:
Compiling Fortran code with the ptx backend I'm working on results in
assembler warnings about mismatch between function calls and function decls.
Bootstrapped and tested on x86_64-linux. Ok?
OK.
The change/bug is due
On 06/04/2014 09:40 AM, Tobias Burnus wrote:
Still untested patch, but I cannot resist pointing out stupid
typos by myself.
I intent to tests the build and test the patch - and then to
commit it as obvious. If you see problems with this approach
please scream now.
I have no idea about the appr
On 06/04/2014 09:40 AM, Tobias Burnus wrote:
Still untested patch, but I cannot resist pointing out stupid
typos by myself.
I intent to tests the build and test the patch - and then to
commit it as obvious. If you see problems with this approach
please scream now.
Even with this applied, I'm s
On 06/04/2014 10:36 PM, Tobias Burnus wrote:
Bernd Schmidt wrote:
Even with this applied, I'm still seeing similar failures.
I didn't claim that the patch would fix everything – nor that it was
well tested.
Just wanted to report back since the problem doesn't really sho
e gcc111 machine). Ok?
Bernd
commit af9bca9c6439e3f8f31b40d5813a3d016b1f21e5
Author: Bernd Schmidt
Date: Wed May 21 12:18:17 2014 +0200
Make a collect-utils library for use by tools like collect2 and lto-wrapper.
* Makefile.in (ALL_HOST_BACKEND_OBJS): Add collect-utils.o.
(lto-wrap
There's a problem when offloading from a compiler for one target machine
to another: the machine specific options don't necessarily match. This
patch tries to address this.
The idea is that since we have two options sections anyway, with
different section name prefixes, we can arrange to pass
On 04/17/2014 08:33 PM, Ilya Verbin wrote:
Could you please take a look at this patch? It fixes the ordering issue in the
tables stated above, and passes all the tests that I have. But I'm not sure
about its correctness from the architectural point of view.
I'm still skeptical relying on orde
On 05/25/2014 11:35 AM, Richard Sandiford wrote:
Bernd Schmidt writes:
On 02/13/2014 10:18 AM, Richard Sandiford wrote:
contrib/
* dg-extract-results.py: New file.
* dg-extract-results.sh: Use it if the environment seems suitable.
I'm now seeing the following:
Trac
9d0
Author: Bernd Schmidt
Date: Wed Jun 11 18:41:09 2014 +0200
Fix an issue with regimplification.
This is in preparation for the lower-address-spaces pass for the ptx port. We
need to teach the regimplifier how to handle the case when an ADDR_EXPR turns
into something els
There's code in regimplification that makes us use an extra temporary
when we encounter a call returning a non-BLKmode structure. This seems
somewhat inefficient and unnecessary, and when used from the
lower-addr-spaces pass I'm working on it leads to problems further
down that look like tree-ssa
icate: gimplify_arg uses is_gimple_lvalue in some cases
instead of is_gimple_val, and regimplification needs to match that.
Bootstrapped and tested on x86_64-linux, ok?
Bernd
commit c1296ac4f4e7e8f0fb9c87d71ca8194a8eac0067
Author: Bernd Schmidt
Date: Wed Jun 11 18:41:09 2014 +0200
Fix an
On 06/16/2014 01:24 PM, Richard Biener wrote:
On Mon, Jun 16, 2014 at 12:56 PM, Bernd Schmidt wrote:
For the ptx port, I've needed to write a new pass which ensures all objects
go into address spaces as required by the machine. This uses the
regimplification code in gimplify-me.c, and
On 06/16/2014 07:26 PM, Mike Stump wrote:
On Jun 16, 2014, at 3:56 AM, Bernd Schmidt
wrote:
For the ptx port, I've needed to write a new pass which ensures all
objects go into address spaces as required by the machine.
I have such a machine and I’ve always approached the problem fro
On 06/17/2014 08:20 PM, Ilya Verbin wrote:
Hello Bernd,
On 28 Feb 17:21, Bernd Schmidt wrote:
For your use case, I'd imagine the offload compiler would be built
relatively normally as a full build with
"--enable-as-accelerator-for=x86_64-linux", which would install it
into loca
On 06/18/2014 04:13 PM, Ilya Verbin wrote:
On 17 Jun 21:22, Bernd Schmidt wrote:
On 06/17/2014 08:20 PM, Ilya Verbin wrote:
I don't get this part of the plan. Where a host compiler will look for
mkoffloads?
E.g., first I configure/make/install the target gcc and corresponding mkof
On 06/19/2014 12:19 PM, Ilya Verbin wrote:
On 18 Jun 16:22, Bernd Schmidt wrote:
What I think you need to do is
For the first compiler:
--enable-as-accelerator-for=x86_64-pc-linux-gnu
--target=x86_64-intelmic-linux-gnu --prefix=/somewhere
No --enable-accelerator options at all. This should
I discovered that create_tmp_var is used in the gfortran frontend to
create static variables. IMO the function is not intended to do this,
and it causes problems for a modification I need to make to it which
assumes that it only creates local variables. So I've made a patch to
make fortran dire
On 06/17/2014 04:54 PM, Martin Jambor wrote:
Weird... does the following (untested) patch help?
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 0afa197..747b1b6 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -3277,6 +3277,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator
*gsi)
On 07/03/2014 06:12 PM, DJ Delorie wrote:
The hardware transfers data in and out of byte-oriented memory in
TYPE_SIZE_UNITS chunks. Once in a hardware register, all operations
are either 8, 16, or 20 bits (TYPE_SIZE) in size. So yes, values are
padded in memory, but no, they are not padded in r
On 07/07/2014 04:50 PM, Ilya Verbin wrote:
2) Or should I build accel compiler as a cross from
x86_64-pc-linux-gnu to x86_64-intelmic-linux-gnu?
Yes, that's the general idea.
Will it help to distinguish the libs?
Is libgomp the only problematic one? (Does the accel compiler even need
one?)
On 05/12/2015 06:27 PM, Thomas Schwinge wrote:
Patch variant 1:
@@ -4266,7 +4266,7 @@ process_command (unsigned int decoded_op
}
gcc_assert (!IS_ABSOLUTE_PATH (tooldir_base_prefix));
- tooldir_prefix2 = concat (tooldir_base_prefix, spec_host_machine,
+ tooldir_prefix2 = concat (too
-19 Bernd Schmidt
+
+ * omp-builtins.def (GOACC_thread_broadcast,
+ GOACC_thread_broadcast_ll): New builtins.
+ * optabs.def (oacc_thread_broadcast_optab): New optab.
+ * builtins.c (expand_builtin_oacc_thread_broadcast): New function.
+ (expand_builtin): Use it.
+ * config/nvptx/nvptx.c
New field
+ warp_equal_pseudos.
+ * config/nvptx/nvptx.md (br_true, br_false): Add %U modifier.
+
2015-05-19 Bernd Schmidt
* omp-builtins.def (GOACC_thread_broadcast,
Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c (r
On 05/20/2015 02:39 PM, Jakub Jelinek wrote:
On Wed, May 20, 2015 at 02:01:44PM +0200, Bernd Schmidt wrote:
To implement OpenACC vector-single mode, we need to ensure that only one
thread out of the group representing a worker executes. The others skip
computations but follow along the CFG, so
On 05/21/2015 09:12 AM, Thomas Schwinge wrote:
OK to commit?
gcc/
* config/nvptx/nvptx.md (allocate_stack): Rename to...
(allocate_stack_): ... this, and add :P on both
match_operand and unspec.
(allocate_stack): New expander.
If you really want to. It
23444)
+++ gcc/ChangeLog.gomp (working copy)
@@ -1,5 +1,15 @@
2015-05-20 Bernd Schmidt
+ * omp-low.c (struct omp_region): Add a gwv_this field.
+ (bb_region_map): New variable.
+ (find_omp_for_region_data, find_omp_target_region_data): New static
+ functions.
+ (build_omp_regions_1): Call them. Buil
y)
@@ -1,3 +1,18 @@
+2015-05-29 Bernd Schmidt
+
+ * gimple.def (GIMPLE_OMP_ENTRY_END): New code.
+ * gimple.h (gimple_build_omp_entry_end): Declare.
+ (CASE_GIMPLE_OMP): Add GIMPLE_OMP_ENTRY_END.
+ * gimple.c (gimple_build_omp_entry_end): New function.
+ * gimple-low.c (lower_stm
(working copy)
@@ -1,5 +1,13 @@
2015-05-29 Bernd Schmidt
+ * config/nvptx/nvptx.md (UNSPECV_BARSYNC): New constant.
+ (oacc_threadbarrier): New expander.
+ (threadbarrier_insn): New pattern.
+ * config/nvptx/nvptx.c (nvptx_cannot_copy_insn_p):
+ * omp-builtins.def (BUILT_IN_GOACC_THREADBARRIER
===
--- gcc/ChangeLog.gomp (revision 223870)
+++ gcc/ChangeLog.gomp (working copy)
@@ -1,5 +1,10 @@
2015-05-29 Bernd Schmidt
+ * omp-low.c (struct omp_context): Add worker_var and worker_count
+ fields.
+ (oacc_init_count_vars): New function.
+ (lower_omp_target): Call it.
+
* config
On 06/01/2015 12:10 PM, Tom de Vries wrote:
On 29/05/15 18:23, Bernd Schmidt wrote:
When predicating the code for OpenACC, we should avoid the entry block
in an offloaded region, which contains setup code that should be run in
every thread. The following patch adds a new marker statement that
Index: gcc/ChangeLog.gomp
===
--- gcc/ChangeLog.gomp (revision 223974)
+++ gcc/ChangeLog.gomp (working copy)
@@ -1,3 +1,29 @@
+2015-06-01 Bernd Schmidt
+
+ * gimple.h (struct gimple_statement_omp_parallel_layout): Add a
+ broadcast_a
On 06/02/2015 01:06 PM, Thomas Schwinge wrote:
On Mon, 1 Jun 2015 17:58:51 +0200, Bernd Schmidt
wrote:
This extends the previous vector-single support to also handle
worker-level predication. [...]
This causes the following regressions; would you please have a look?
I committed the
On 04/21/2015 05:58 PM, Jakub Jelinek wrote:
suggests that while it is nice that when building nvptx accel compiler
we build libgcc.a, libc.a, libm.a, libgfortran.a (and in the future hopefully
libgomp.a),
nothing attempts to link those in :(.
I have that fixed; I expect I'll get around to po
On 04/27/2015 06:08 PM, Thomas Schwinge wrote:
OK to do the following instead? (Coding style/code copied from
gcc/config/i386/intelmic-mkoffload.c for consistency.)
Err, was this a question for me? I'm fine with that too.
Bernd
Hi Jakub,
I have some questions about nvptx:
1) you've said that alloca isn't supported, but it seems
to be wired up and uses the %alloca documented in the PTX
manual, what is the issue with that? %alloca not being actually
implemented by the current PTX assembler or translator?
Y
On 11/14/2014 11:01 AM, Jakub Jelinek wrote:
On Fri, Nov 14, 2014 at 09:29:48AM +0100, Jakub Jelinek wrote:
I have some questions about nvptx:
Oh, and
5) I have noticed gcc doesn't generate the .uni suffixes anywhere,
while llvm generates them; are those appropriate only when a function
On 11/14/2014 12:03 PM, Jakub Jelinek wrote:
On Fri, Nov 14, 2014 at 11:57:57AM +0100, Richard Biener wrote:
? There are also some comments about stdarg.h and stdio.h ordering,
dunno what it comes from and if it is still relevant when we require
C++ compiler.
I think we should simply discoura
I'm adding Thomas and Cesar to the Cc list, they may have more insight
into CUDA library questions as I haven't really looked into that part
all that much.
On 11/14/2014 12:39 PM, Jakub Jelinek wrote:
On Fri, Nov 14, 2014 at 12:09:03PM +0100, Bernd Schmidt wrote:
I have some quest
On 11/14/2014 01:36 PM, Jakub Jelinek wrote:
Any way to query those limits? Size of .shared memory, number of threads in
warp, number of warps, etc.?
I'd have to google most of that. There seems to be a WARP_SZ constant
available in ptx to get the size of the warp.
In OpenACC, are all work
Hi Tobias,
Does printf work? I thought I/O is not supported? Or does it just accept
it for linking and drop it? I think Janne's patch has already dealt with
the issue of stack allocation.
printf (or more accurately vprintf) is supported by ptx as a magic
builtin function. We have a printf wra
The situation with debugging on ptx is a little strange - it allows
.file and .loc directives for line numbers, and it provides a way to
define dwarf2 debug sections - but as far as I can tell, there's no way
of putting useful or accurate information into the latter. There's also
the slight pro
On 11/05/2014 01:19 AM, Bernd Schmidt wrote:
On 11/04/2014 10:50 PM, Jeff Law wrote:
No, I don't think it's terminology. It's really that in effect we have
two targets. One is a normal CPU, the other is a GPU.
ie, there's nothing that says we won't have a GPU that
On 11/03/2014 11:23 PM, Jeff Law wrote:
On 11/01/14 05:51, Bernd Schmidt wrote:
LTO has a mechanism not to stream out common nodes that are expected to
be identical on each run. When using LTO to communicate between
compilers for different targets, the va_list_type_node and related ones
must be
On 11/05/2014 12:17 AM, Jeff Law wrote:
On 11/04/14 14:08, Bernd Schmidt wrote:
On 11/04/2014 10:01 PM, Jeff Law wrote:
Communication between host and GPU is all done via some form of memcpy,
so I wouldn't expect this to be a problem.
They still need to agree on the layout of the stru
On 11/14/2014 08:19 PM, Segher Boessenkool wrote:
+ /* If I2 is a PARALLEL of two SETs of REGs (and perhaps some CLOBBERs),
+ make those two SETs separate I1 and I2 insns, and make an I0 that is
+ the original I1. */
+ if (i0 == 0
+ && GET_CODE (PATTERN (i2)) == PARALLEL
+ &&
On 11/19/2014 02:50 AM, Bernd Schmidt wrote:
@@ -8417,6 +8926,9 @@ expand_omp_target (struct omp_region *region)
/* Add the new function to the offload table. */
vec_safe_push (offload_funcs, child_fn);
+ /* Add the new function to the offload table
Another change that's required is (something like) the following. For
ptx, we need to know whether to output something as a .func (callable
from ptx code) or a .kernel (callable from the host). That means we need
to mark the kernel functions somehow in omp-low.c, and the following
does that by
On 11/13/2014 05:06 AM, Jan Hubicka wrote:
this patch adds infrastructure for proper streaming and merging of
TREE_TARGET_OPTION.
This breaks the offloading path via LTO since it introduces an
incompatibility in LTO format between host and offload machine.
A very quick patch to fix it is bel
te this
to use a dollar sign.
The patch below does this at the lto-read stage. Bootstrapped on
x86_64-linux, ok if testing is successful?
Bernd
commit 26b41de43c6db6e2368a9511c589c433b1e49c96
Author: Bernd Schmidt
Date: Wed Nov 19 21:47:59 2014 +0100
Renaming for invalid symbols w
On 11/20/2014 07:52 AM, Jakub Jelinek wrote:
On Thu, Nov 20, 2014 at 03:19:11AM +0100, Bernd Schmidt wrote:
Thomas had apparently already pointed out an issue with the new gomp_target
class (there are multiple similar types of statements we want to handle with
OpenACC, they have different codes
On 11/20/2014 02:20 PM, Richard Biener wrote:
On Thu, 20 Nov 2014, Bernd Schmidt wrote:
On 11/13/2014 05:06 AM, Jan Hubicka wrote:
this patch adds infrastructure for proper streaming and merging of
TREE_TARGET_OPTION.
This breaks the offloading path via LTO since it introduces an
On 11/14/2014 10:28 PM, Tobias Burnus wrote:
All in all: Okay when tesing succeeded. I still prefer some words what's
excluded (or included) in minimal as comment in configure.ac, but the
patch is also okay without.
I thought you meant something more than adding a comment. I've added
this in t
On 02/25/2015 11:28 AM, Thomas Schwinge wrote:
Am I on the right track with my assumption that it is correct that
nvptx.c:nvptx_option_override is not invoked in the offloading code path,
so we'd need a new target hook (?) to consolidate/override the options in
this scenario?
I'm surprised by
ree
situations and it didn't go through).
Bernd
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8e4b6c1..d5535f9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,8 @@
2015-03-27 Bernd Schmidt
+ * config/c6x/c6x.md (movmisalign): Use MEM_P, not
+ memory_operand.
+
PR targ
Bernd
commit 432e3b7c5e3e47fdc9232805519d54f516c18008
Author: Bernd Schmidt
Date: Fri Mar 27 13:32:31 2015 +0100
Fix c6x-uclinux build failure.
* config/c6x/constraints.md (S3): New constraint.
* config/c6x/c6x.md (real_jump): Use it.
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 5cee0a5.
On 05/28/2015 05:08 PM, Jakub Jelinek wrote:
I understand it is more work, I'd just like to ask that when designing stuff
for the OpenACC offloading you (plural) try to take the other offloading
devices and host fallback into account.
The problem is that many of the transformations we need to
On 06/19/2015 02:25 PM, Jakub Jelinek wrote:
Emitting PTX specific code from current ompexp is highly undesirable of
course, but I must say I'm not a big fan of keeping the GOMP_* gimple trees
around for too long either, they've never meant to be used in low gimple,
and even all the early optimiz
On 06/19/2015 03:45 PM, Jakub Jelinek wrote:
I actually believe having some optimization passes in between the ompexp
and the lowering of the IR into the form PTX wants is highly desirable,
the form with the worker-single or vector-single mode lowered will contain
too complex CFG for many optimiz
On 06/22/2015 04:24 PM, Jakub Jelinek wrote:
I don't understand why lowering the way you suggest helps here at all.
In the proposed scheme, you essentially have whole function
in e.g. worker-single or vector-single mode, which you need to be able to
handle properly in any case, because users can
701 - 800 of 2198 matches
Mail list logo