On Wed, 18 Jan 2017, Richard Biener wrote:
> > After OpenMP lowering, inlining might break this by inlining functions with
> > address-taken locals into SIMD regions. For now, such inlining is
> > disallowed
> > (this penalizes only SIMT code), but eventually that can be handled by
> > collecting
On Wed, 18 Jan 2017, Richard Biener wrote:
> But I guess I was asking whether you could initially emit
>
> void *omp_simt = IFN_GOMP_SIMT_ENTER (0);
>
> for (int i = n1; i < n2; i++)
> foo (&tmp);
>
> IFN_GOMP_SIMT_EXIT (omp_simt);
>
> and only after inlining do liveness / use analysi
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> Can't it be e.g. recorded inside a flag on the VAR_DECLs or magic attributes
> on them during omplower time and then only finalized into the magic .local
> alloca in the pass_omp_device_lower pass?
No (see my adjacent response): it can't be a variable fl
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > +/* This structure is part of the interface between
> > lower_rec_simd_input_clauses
> > + and lower_rec_input_clauses. */
> > +
> > +struct omplow_simd_context {
> > + tree idx;
> > + tree lane;
> > + int max_vf;
> > + bool is_simt;
>
> Any re
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> On Wed, Jan 18, 2017 at 05:52:49PM +0300, Alexander Monakov wrote:
> > On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > > Can't it be e.g. recorded inside a flag on the VAR_DECLs or magic
> > > attributes
> > > o
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> We are talking here about addressable vars, right (so if we turn it into
> non-addressable, in the SIMT region we just use the normal PTX pseudos),
> right? We could emit inner ={v} {CLOBBER}; before SIMT_EXIT() to make it
> clear it shouldn't be moved a
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > It is, but I think my approach is compatible with inlining too (and has a
> > more
> > localized impact on the compiler).
>
> But your 2/5 patch disables inlining into the SIMT regions. Or do you mean
> the approach with some new IFN for the pointers
Hello Jakub,
Sorry for not noticing this earlier, but ...
> +#ifdef __LP64__
> +typedef unsigned long long CUdeviceptr;
> +#else
> +typedef unsigned CUdeviceptr;
> +#endif
I think this #ifdef doesn't do the right thing on MinGW.
Would it be fine to simplify it? In my code I have
typedef uint
On Thu, 19 Jan 2017, Richard Biener wrote:
> >> What about motion in the other direction, upwards across SIMT_ENTER()?
> >
> > I think this is a question for Richard, whether it can be done in the alias
> > oracle. If yes, it supposedly can be done for both SIMT_ENTER and
> > SIMT_EXIT.
>
> Code
On Thu, 19 Jan 2017, Jakub Jelinek wrote:
> On Thu, Jan 19, 2017 at 10:45:08AM +0100, Richard Biener wrote:
> > > But in the escape analysis we could consider all the specially marked
> > > "omp simt private" addressable vars to escape and thus confine them into
> > > the
> > > SIMT region that wa
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> > Inlining needs to do just like omp-low; if we take the current framework, it
> > would need to collect addressable locals into one struct, replace
> > references to
> > those locals by field references in the inlined body. Then it needs to
> > appropr
On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> On Wed, Jan 18, 2017 at 10:52:32PM +0300, Alexander Monakov wrote:
> > Sorry for not noticing this earlier, but ...
> >
> > > +#ifdef __LP64__
> > > +typedef unsigned long long CUdeviceptr;
> > > +#else
> >
On Thu, 19 Jan 2017, Jakub Jelinek wrote:
> On Thu, Jan 19, 2017 at 04:36:25PM +0300, Alexander Monakov wrote:
> > > One of the problems with that is that it means that you can't easily turn
> > > addressable private variables into non-addressable ones once you force
Hi,
Here's a different approach that doesn't introduce indirection for privatized
variables at all, and keeps dependencies obvious in the IR, but, on the flip
side, requires mentioning all subfields of privatized structures in a few
places.
For each privatized variable, add it to the list of outp
Hello,
This patchset implements privatization of addressable variables in OpenMP SIMD
regions lowered for SIMT targets (i.e. NVPTX) via the approach identified in
the review of the previous submission.
Now instead of explicitly privatizing those variables as fields of an
allocated struct up front
This patch implements rewriting of SIMT private variables as fields of a
struct by setting DECL_VALUE_EXPR on them and regimplifying statements.
* omp-offload.c: Include langhooks.h, tree-nested.h, stor-layout.h.
(ompdevlow_adjust_simt_enter): New.
(find_simtpriv_var_op): N
This patch adds handling of new omp_simt_enter/omp_simt_exit named insns
in the NVPTX backend.
* config/nvptx/nvptx-protos.h (nvptx_output_simt_enter): Declare.
(nvptx_output_simt_exit): Declare.
* config/nvptx/nvptx.c (nvptx_init_unisimt_predicate): Use
cfun->machi
This patch adjusts privatization in OpenMP SIMD loops lowered for SIMT targets.
At lowering time, private variables receive "omp simt private" attribute, get
mentioned in argument list of GOMP_SIMT_ENTER function, and get a clobbering
assignment just prior to GOMP_SIMT_EXIT function.
The following
This patch implements promotion of SIMT private variables if GOMP_SIMT_ENTER
is the only remaining statement where their address is taken, by handling it
similar to ASAN_MARK.
To avoid rebuilding GOMP_SIMT_ENTER statement from scratch, set argument
slot to a null pointer when the corresponding var
This patch implements privatization for SIMT during inlining. We need to
discover if the call being inlined belongs to a SIMT region (by looking at
simduid of the containing loop), and if so, treat them similar to OpenMP-SIMD
privatization: add the "omp simt private" attribute and mention them amo
On Thu, 23 Mar 2017, Jakub Jelinek wrote:
> > + if (vf != 1)
> > + continue;
> > + unlink_stmt_vdef (stmt);
>
> This is weird. AFAIK unlink_stmt_vdef just replaces the uses of the vdef
> of that stmt with the vuse, but it still keeps the vdef (and vuse) around
> on the stmt, t
On Thu, 23 Mar 2017, Jakub Jelinek wrote:
> On Wed, Mar 22, 2017 at 06:46:34PM +0300, Alexander Monakov wrote:
> > @@ -4730,6 +4746,25 @@ expand_call_inline (basic_block bb, gimple *stmt,
> > copy_body_data *id)
> >if (cfun->gimple_df)
> > pt_solution_re
On Thu, 23 Mar 2017, Jakub Jelinek wrote:
> > Sorry for missing the IR stability issue. This code relies on dst_simt_vars
> > being a set and thus having no duplicate entries (so the implicit lookup
> > when
> > adding an element is needed).
> >
> > However, I think I was overly cautious: lookin
On Thu, 23 Mar 2017, Jakub Jelinek wrote:
> And then clear it. That doesn't look like the right thing.
>
> So either you need some bool variable whether you've actually allocated
> the vector in the current expand_call_inline and use that instead of
> if (id->dst_simt_vars), or maybe you should c
Hello Bernd,
Can you have a look at this patch (unchanged from previous posting in January)?
The rest of the patches in the set are reviewed.
On Wed, 22 Mar 2017, Alexander Monakov wrote:
> This patch adds handling of new omp_simt_enter/omp_simt_exit named insns
> in the NVPTX b
Hello Jakub,
I've noticed while re-reading that this patch incorrectly handled SIMT case
in lower_lastprivate_clauses. The code was changed to look for variables
with "omp simt private" attribute, and was left under
'simduid && DECL_HAS_VALUE_EXPR_P (new_var)' condition. This effectively
constra
Hello,
I've applied the following patch as obvious to fix the -Wformat issue pointed
out by Thomas Schwinge.
* config/nvptx/nvptx.c (nvptx_output_softstack_switch): Correct format
string.
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 83f4610..4c35c16 1006
Ping.
> I've noticed while re-reading that this patch incorrectly handled SIMT case
> in lower_lastprivate_clauses. The code was changed to look for variables
> with "omp simt private" attribute, and was left under
> 'simduid && DECL_HAS_VALUE_EXPR_P (new_var)' condition. This effectively
> cons
Hi,
PR 71250 asks to explicitly document that -Wmissing-field-initializers warning
was enhanced some time ago to suppress warnings on uses of the universal zero
initializer { 0 } in C language. The documentation already calls out that the
warning is silenced in C++ for '{ }', the patch adds the c
Ping - as this patch addresses a wrong-code issue in new functionality, I'd like
to ask if it may be applied to gcc-7 branch too.
On Fri, 7 Apr 2017, Alexander Monakov wrote:
> Ping.
>
> > I've noticed while re-reading that this patch incorrectly ha
On Thu, 20 Apr 2017, Jakub Jelinek wrote:
> > This wasn't caught in testing, as apparently all testcases that have target
> > simd loops with linear/lastprivate clauses also have the corresponding
> > variables
> > mentioned in target map clause, which makes them addressable (is that
> > necessar
On Thu, 27 Apr 2017, Richard Biener wrote:
> struct q { int n; long o[100]; };
> struct r { int n; long o[0]; };
>
> union {
> struct r r;
> struct q q;
> } u;
>
> int foo (int i, int j)
> {
> long *q = u.r.o;
> u.r.o[i/j] = 1;
> return q[2];
> }
>
> but nothing convinced schedulin
Hello,
I'd like to apply the following LRA patch to make qsort comparator
reload_pseudo_compare_func proper (right now it lacks transitivity
due to incorrect use of non_reload_pseudos bitmap, PR 68988).
This function was originally a proper comparator, and the problematic
code was added as a fix
On Sat, 16 Sep 2017, Tom Tromey wrote:
> --- a/gcc/doc/cpp.texi
> +++ b/gcc/doc/cpp.texi
> @@ -1675,20 +1675,27 @@ macro. We could define @code{eprintf} like this,
> instead:
[snip]
> +This formulation looks more descriptive, but historically it was less
> +flexible: you had to supply at least on
Hi,
After recent changes, the member_name_cmp qsort comparator can indicate
A < B < A (i.e. lacks anti-commutativity) for distinct TYPE_DECL nodes
that have the same source location. If their order doesn't matter, the
comparator should return 0.
Invoking qsort with improper comparator at best ma
Hello,
The autopref_rank_for_schedule qsort sub-comparator is not actually a proper
comparator. For instance, it lacks transitivity: if there's insns A, B, C
such that B has AUTOPREF_MULTUPASS_DATA_IRRELEVANT status, but A and C
compare such that C < A, we can have A == B == C < A according to th
On Tue, 19 Sep 2017, Nathan Sidwell wrote:
> On 09/19/2017 07:06 AM, Alexander Monakov wrote:
> > Hi,
> >
> > After recent changes, the member_name_cmp qsort comparator can indicate
> > A < B < A (i.e. lacks anti-commutativity) for distinct TYPE_DECL nodes
>
On Tue, 19 Sep 2017, Nathan Sidwell wrote:
> > > > After recent changes, the member_name_cmp qsort comparator can indicate
> > > > A < B < A (i.e. lacks anti-commutativity) for distinct TYPE_DECL nodes
> > > > that have the same source location. If their order doesn't matter, the
> > > > comparato
On Tue, 19 Sep 2017, Maxim Kuvyrkov wrote:
> How about the following:
> 1. if both instructions are "irrelevant", then return "0".
> 2. if one instruction is "relevant" and another is "irrelevant", then
> "relevant" instruction is always greater (or lesser) than the non-relevant.
> 3. if both inst
> I'd like to keep read/write processing balanced. In the above "read" analysis
> has greater weight than "write" analysis. Also, autopref_rank_data() should
> not be called if !rtx_equal_p (data1->base, data2->base).
I'm afraid this doesn't work. Consider you have insns A, B, C such that all
a
On Wed, 20 Sep 2017, Nathan Sidwell wrote:
> > You can use the gcc_assert mentioned in the previous email on GCC
> > bootstrap/regtest to find examples. For me, the following example breaks
> > (no
> > command line flags needed, just bare 'cc1plus t.i'):
> >
> > struct
> > {
> >int a, b, c, d
Hi,
Most compiler invocations don't actually need an entropy source, so
open-read-close syscall sequence on /dev/urandom that GCC performs on
each startup is useless (and can easily be avoided).
This patch makes GCC read entropy from /dev/urandom lazily on first
call to get_random_seed, and en pa
On Thu, 21 Sep 2017, Jakub Jelinek wrote:
> Why isn't init_local_tick done at the get_random_seed time too?
> I.e. inlined into get_random_seed by hand like you've done for
> init_random_seed?
init_local_tick initializes the 'local_tick' global variable that
is directly accessed from coverage.c.
Hi,
The compare_access_positions qsort comparator lacks transitivity, although
somewhat surprisingly this issue didn't manifest on 64-bit x86 bootstraps.
The first invalid comparison step is here (tree-sra.c:1545):
/* Put the integral type with the bigger precision first. */
else if
On Tue, 19 Sep 2017, Alexander Monakov wrote:
> * haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' insns
> first, always call autopref_rank_data otherwise.
May I apply this patch now to unblock qsort checking? Further changes or
adjustments can then go
On Thu, 21 Sep 2017, Richard Sandiford wrote:
> LGTM FWIW, but isn't there also the problem that the TYPE_PRECISION
> test fails to stabilise the sort if you have two integral types with
> the same precision?
Yes, but that's a pre-existing issue, so I didn't change it in the patch.
I think GCC bro
On Mon, 25 Sep 2017, Martin Jambor wrote:
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1542,19 +1542,20 @@ compare_access_positions (const void *a, const void
> *b)
> && TREE_CODE (f2->type) != COMPLEX_TYPE
> && TREE_CODE (f2->type) != VECTOR_TYPE)
> return -
Hello,
I'm going to install the following patch on trunk in the next few hours.
This revision doesn't offer per-callsite opt-out anymore as suggested by
Richi on the Cauldron (made possible by fixing all known issues on trunk).
Thus this patch has a few minor differences compared to the previous
r
On Fri, 29 Sep 2017, Andrew Pinski wrote:
> > This patch (r253295) breaks the gcc build for aarch64-linux-gnu:
>
> I was just about to report the same thing.
I think autoprefetch ranking heuristic is still wrong if multi_mem_insn_p
may be true; please try this patch.
* haifa-sched.c (aut
On Tue, 3 Oct 2017, Jakub Jelinek wrote:
> The qsort cmp transitivity checks may fail with the sort_by_operand_rank
> comparator, because if one of the operands has stmt_to_insert and the
> other doesn't (but likely also if one SSA_NAME is (D) and the other is not),
> we fallthrough into SSA_NAME_V
On Wed, 4 Oct 2017, Ramana Radhakrishnan wrote:
> However we need a scheduler maintainer or global reviewer to please
> help review this patch or help come up with an alternative patch. A
> primary platform broken for 5 days with a commit and no public
> response from the original poster is really
Hello,
In ira-color.c, qsort comparator allocno_priority_compare_func lacks anti-
commutativity and can indicate A < B < A if boths allocnos satisfy
non_spilled_static_chain_regno_p. It should fall down to following
sub-comparisons in that case.
There is another issue: the comment doesn't match
On Thu, 5 Oct 2017, Maxim Kuvyrkov wrote:
> I'm still working on analysis, but it appears to me that Alexander's patch
> (current state of trunk) fails qsort check due to not being symmetric for
> load/store analysis (write == 0 or write == 1) in comparisons with
> "irrelevant" instructions. Wilco
Hi,
Earlier Richard mentioned the possibility to special-case GOMP_SIMT_ENTER to
allow passing privatized variables to it by reference without making them
addressable. I now see that such special-casing is already done for
IFN_ATOMIC_COMPARE_EXCHANGE in tree-ssa.c: execute_update_addresses_taken
On Wed, 1 Feb 2017, Jakub Jelinek wrote:
> IFN_ASAN_POISON is treated that way too. That also means that if a
> variable is previously addressable and the only spot that takes its address
> is that IFN, it can be rewritten into SSA form, but the IFN has to be
> adjusted to something different whic
On Wed, 1 Feb 2017, Jakub Jelinek wrote:
> > Yes; I imagine the approach taken in patch 2/5 can be extended to achieve
> > this.
> > That is, instead of just storing a flag 'bool in_simtreg' in struct loop,
> > store
> > pointers to corresponding SIMT_ENTER/EXIT gimple statements, use a similar
>
On Thu, 16 Feb 2017, Thomas Schwinge wrote:
> On Mon, 9 Jan 2017 17:21:41 +0100, I wrote:
> > On Thu, 29 Dec 2016 16:15:01 +0100, Jakub Jelinek wrote:
> > > PR translation/78745
> > > * exgettext: Handle multi-line help texts in *.opt files.
> >
> > With this committed in r243981, I noticed t
On Thu, 16 Feb 2017, Jakub Jelinek wrote:
> On Thu, Feb 16, 2017 at 01:56:15PM +0300, Alexander Monakov wrote:
> Are you sure you can't have them in *.c file (e.g. by setting some variable
> to a spec string or similar)?
> I think it is better to scan all those files.
Hm, proba
On Fri, 17 Feb 2017, Cesar Philippidis wrote:
> > And then, I don't specifically have a problem with discontinuing CUDA 5.5
> > support, and require 6.5, for example, but that should be a conscious
> > decision.
>
> We should probably ditch CUDA 5.5. In fact, according to trunk's cuda.h,
> it requ
On Fri, 17 Feb 2017, Thomas Schwinge wrote:
> On Fri, 17 Feb 2017 14:00:09 +0100, I wrote:
> > [...] for "normal" functions there is no reason to use the
> > ".param" space for passing arguments in and out of functions. We can
> > then get rid of the boilerplate code to move ".param %in_ar*" into
Hello Nathan,
On Wed, 9 Mar 2016, Nathan Sidwell wrote:
> I've committed this to trunk, to remove the squashing of debug information.
> It appears to function correctly.
>
> I'd had this patch for a while, but forgot to commit it.
The preceding code special-casing response to -gstabs can also be
On Wed, 9 Mar 2016, Nathan Sidwell wrote:
> On 03/09/16 09:55, Alexander Monakov wrote:
> > The preceding code special-casing response to -gstabs can also be removed
> > after this patch. Should I submit the (trivial) removal patch?
>
> No. I found that necessary to stop
On Wed, 9 Mar 2016, Nathan Sidwell wrote:
> > Furthermore, this is not useful without support in libgomp/plugin-nvptx.c
> > and nvptx-none-run.c (PTX JIT does not propagate lineinfo by default).
> > Would you like me to submit patches for those?
>
> please.
Here's the pull request for nvptx-run.c
On Thu, 10 Mar 2016, Nathan Sidwell wrote:
> Hm, something must have changed since I found that sorry neccessary.
As I already said in my opening sentence (not quoted in your response), you
removed the unnecessary override. This is exactly what lets toplevel code see
requested debug format now,
On Mon, 14 Mar 2016, Andrey Belevantsev wrote:
> In this case, we get an inconsistency between the sched-deps interface, saying
> we can't move an insn writing the si register through a vector insn, and the
> liveness analysis, saying we can. The latter doesn't take into account
> implicit_reg_pen
On Mon, 14 Mar 2016, Andrey Belevantsev wrote:
> Here we're looping because we decrease the counter of the insns we still can
> issue on a DEBUG_INSN thus rendering the counter negative. The fix is to not
> count debug insns in the corresponding code. The selective scheduling is
> known to spoil
On Mon, 14 Mar 2016, Andrey Belevantsev wrote:
> We speculate an insn in the PR but we do not make a check for it though we
> should. The thing that broke this was the fix for PR 45472. In that pr, we
> have moved a volatile insn too far up because we failed to merge the bits
> describing its vol
On Mon, 14 Mar 2016, Andrey Belevantsev wrote:
> We fail to find the proper seqno for the fresh bookkeeping copy in this PR.
> The problem is that in get_seqno_by_preds we are iterating over bb from the
> given insn backwards up to the first bb insn. We skip the initial insn when
> iterating over
On Tue, 15 Mar 2016, Marek Polacek wrote:
> This test fails for me due to
> cc1plus: warning: var-tracking-assignments changes selective scheduling
Thanks for the heads-up Marek, and sorry for the trouble. Like I said in the
adjacent reply, the warning is expected (I didn't realize the testsuite
On Tue, 15 Mar 2016, Andrey Belevantsev wrote:
> On 15.03.2016 20:44, Alexander Monakov wrote:
> > On Tue, 15 Mar 2016, Marek Polacek wrote:
> > > This test fails for me due to
> > > cc1plus: warning: var-tracking-assignments changes selective scheduling
> >
>
error reporting
(this is a regression that is also visible on trunk with OpenACC offloading),
and patch 4 is a slightly more comprehensive fix to nvptx debuginfo generation.
Alexander Monakov (7):
libgomp: remove paste error in gomp_team_barrier_wait_end
nvptx libgcc: use attribute shared
* config/nvptx/bar.c: Remove wrong invocation of
gomp_barrier_wait_end from gomp_team_barrier_wait_end.
---
libgomp/ChangeLog.gomp-nvptx | 5 +
libgomp/config/nvptx/bar.c | 2 --
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/libgomp/config/nvptx/bar.c b/libgo
* config/nvptx/crt0.c (__nvptx_stacks): Define in C. Use it...
(__nvptx_uni): Ditto.
(__main): ...here instead of inline asm.
* config/nvptx/stacks.c (__nvptx_stacks): Define in C.
(__nvptx_uni): Ditto.
---
libgcc/ChangeLog.gomp-nvptx | 8
libgcc
Recent testing uncovered that PTX JIT may reject attempts to use 'mul.u32' as
a non-widening 32-bit multiply instruction. Use 'mul.lo.u32' to fix 32-bit
code generation and conform to the PTX spec better.
* config/nvptx/nvptx.c (nvptx_init_unisimt_predicate): Emit
'mul.lo.u32' ins
Unlike cuMemFree and other resource-releasing functions called on exit,
cuMemFreeHost appears to re-report errors encountered in kernel launch.
This leads to a deadlock after GOMP_PLUGIN_fatal is reentered.
While the behavior on libgomp side is suboptimal (there's no need to
call resource-releasin
* config/nvptx/nvptx.c (nvptx_option_override): Remove custom handling
of debug info options.
---
gcc/ChangeLog.gomp-nvptx | 5 +
gcc/config/nvptx/nvptx.c | 9 -
2 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/n
Default definition of STACK_SIZE_MODE is word_mode, which is DImode on NVPTX.
However, stack pointer mode matches pointer mode, so needs to be SImode on
32-bit NVPTX ABI. Define it to Pmode to fix 32-bit code generation.
* config/nvptx/nvptx.h (STACK_SIZE_MODE): Define.
---
gcc/ChangeLog
POINTER_SIZE is the proper macro to retrieve pointer size in bits for the
target ABI, but new code incorrectly used BITS_PER_WORD, breaking 32-bit
code generation.
* config/nvptx/nvptx.c (nvptx_init_unisimt_predicate): Use
POINTER_SIZE instead of BITS_PER_WORD.
(nvptx_decla
Hi,
I'd like to note that I have a small patch on gomp-nvptx branch that deals
with the worst user-visible regression in a non-intrusive manner:
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01109.html
Alexander
Hi,
On Thu, 24 Mar 2016, Bernd Schmidt wrote:
> On 03/24/2016 11:17 AM, Aldy Hernandez wrote:
> > On 03/23/2016 10:25 AM, Bernd Schmidt wrote:
> > > It looks like this block of code is written by a helper function that is
> > > really intended for other purposes than for maximal_insn_latency. Migh
Loops lacking exit edges can trigger an NVIDIA driver sm_50 code generation
bug, which manifested as stack pointer (SASS register R1) corruption in this
case. Adjusting source by hand to arrange a cheap exit branch seems to be the
most reasonable workaround. NVIDIA bug ID 200177879.
* con
I have committed two nvptx libgomp tweaks to amonakov/gomp-nvptx branch, one
to improve efficiency, another to workaround a Maxwell-specific driver bug.
Alexander Monakov (2):
libgomp: avoid malloc calls in gomp_nvptx_main
libgomp: avoid triggering a driver bug on sm_50
libgomp
Avoid calling malloc where it's easy to use stack storage instead: device
malloc is very slow in CUDA. This cuts about 60-80 microseconds from target
region entry/exit time, slimming down empty target regions from ~95 to ~17
microseconds (as measured on a GTX Titan).
* config/nvptx/target
This fixes a bug in the NVPTX backend where taking the address of a function
renamed by the backend (e.g. 'call' or 'malloc') would wrongly use the
original name. Now all decl renaming is handled up front via
TARGET_MANGLE_DECL_ASSEMBLER_NAME hook, which becomes the only caller of
nvptx_name_replac
Hello,
I have committed to trunk as obvious the following patch to add a missing
period at the end of help text for the '-moptimize' NVPTX backend option.
Alexander
* config/nvptx/nvptx.opt (moptimize): Add a period at end of help text.
--- gcc/config/nvptx/nvptx.opt
+++ gcc/config/nvpt
On Fri, 15 Apr 2016, Michael Matz wrote:
> On Thu, 14 Apr 2016, Maxim Kuvyrkov wrote:
>
> > It appears that implementing -fprolog-pad=N option in GCC will not
> > enable kernel live-patching support for AArch64. The proposal for the
> > option was to make GCC output a given number of NOPs at th
On Fri, 15 Apr 2016, Alexander Monakov wrote:
> On Fri, 15 Apr 2016, Michael Matz wrote:
> > Replace first nop with a breakpoint, handle rest of patching in breakpoint
> > handler, patch breakpoint insn last, no need to atomically patch multiple
> > instructions.
>
On Thu, 14 Apr 2016, Szabolcs Nagy wrote:
> looking at [2] i don't see why
>
> func:
> mov x9, x30
> bl _tracefunc
>
>
> is not good for the kernel.
>
> mov x9, x30 is a nop at function entry, so in
> theory 4 byte atomic write should be enough
> to enable/disable tracing.
Overwriting x9
On Mon, 18 Apr 2016, Ramana Radhakrishnan wrote:
> On Mon, Apr 18, 2016 at 2:26 PM, Alexander Monakov wrote:
> > On Thu, 14 Apr 2016, Szabolcs Nagy wrote:
> >> looking at [2] i don't see why
> >>
> >> func:
> >> mov x9, x30
> >>
On Mon, 18 Apr 2016, Ramana Radhakrishnan wrote:
> > - and GCC is not smart enough to be aware that intra-TU calls to 'func' (the
> > function we're instrumenting) don't touch x16/x17. And GCC should be that
> > smart, if it's not, it's a bug, right? :)
> >
>
> That it already is - IIRC. Oth
On Mon, 18 Apr 2016, Szabolcs Nagy wrote:
> On 18/04/16 14:26, Alexander Monakov wrote:
> > On Thu, 14 Apr 2016, Szabolcs Nagy wrote:
> >> looking at [2] i don't see why
> >>
> >> func:
> >> mov x9, x30
> >> bl _tracefunc
> >>
This brings write_omp_entry code a bit closer in style to the rest of nvptx.c
by using write_fn_marker, and hopefully makes it a bit clearer. No functional
change.
* config/nvptx/nvptx.c (write_omp_entry): Adjust.
(nvptx_declare_function_name): Adjust.
---
Applied to amonakov/gomp-
On Tue, 19 Apr 2016, AKASHI Takahiro wrote:
> > > But if Szabolcs' two-instruction
> > > sequence in the adjacent subthread is sufficient, this is moot.
> >
> > . It can also be solved by having just one NOP after the function label,
> > and a number of them before, then no thread can be in the
On Tue, 19 Apr 2016, AKASHI Takahiro wrote:
> > looking at [2] i don't see why
> >
> > func:
> > mov x9, x30
> > bl _tracefunc
> >
>
> Actually,
> mov x9, x30
> bl _tracefunc
> mov x30, x9
>
I think here Szabolcs' point was that the last instruction can be eliminated:
_tr
* doc/extend.texi (Nvidia PTX Variable Attributes): New section.
---
Applied to amonakov/gomp-nvptx branch.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e11ce4d..5eeb179 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -5469,6 +5469,7 @@ attributes.
* MeP Vari
On Tue, 19 Apr 2016, Thomas Schwinge wrote:
> Well, I certainly had done at least some thinking before proposing this:
> we're talking about the libgomp "fatal exit" function, called when
> something has gone very wrong, and we're about to terminate the process,
> because there's no hope to recover
Hello!
In responses to this email, I'll be posting 9 NVPTX-specific patches that are
required for enabling OpenMP offloading. I intend to post corresponding
libgomp and middle-end changes that make these useful a bit later.
The patches are generated by taking a diff on amonakov/gomp-nvptx git br
trivially folded when compiling for non-SIMT execution;
otherwise they are kept, and expanded into these insns.
Previously posted here:
[gomp-nvptx 01/13] nvptx backend: new patterns for OpenMP SIMD-via-SIMT
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01550.html
2016-01-17 Alexander Monakov
ET_SIMT_VF
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00122.html
2015-12-09 Alexander Monakov
* config/nvptx/nvptx.c (nvptx_simt_vf): New.
(TARGET_SIMT_VF): Define.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: (TARGET_SIMT_VF): New hook.
* target.def:
ptx shared attribute
https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00940.html
2016-04-19 Alexander Monakov
* doc/extend.texi (Nvidia PTX Variable Attributes): New section.
2016-01-17 Alexander Monakov
* config/nvptx/nvptx.c (nvptx_encode_section_info): Hand
601 - 700 of 1028 matches
Mail list logo