Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Richard Biener wrote: > > After OpenMP lowering, inlining might break this by inlining functions with > > address-taken locals into SIMD regions. For now, such inlining is > > disallowed > > (this penalizes only SIMT code), but eventually that can be handled by > > collecting

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Richard Biener wrote: > But I guess I was asking whether you could initially emit > > void *omp_simt = IFN_GOMP_SIMT_ENTER (0); > > for (int i = n1; i < n2; i++) > foo (&tmp); > > IFN_GOMP_SIMT_EXIT (omp_simt); > > and only after inlining do liveness / use analysi

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > Can't it be e.g. recorded inside a flag on the VAR_DECLs or magic attributes > on them during omplower time and then only finalized into the magic .local > alloca in the pass_omp_device_lower pass? No (see my adjacent response): it can't be a variable fl

Re: [PATCH 1/5] omp-low: introduce omplow_simd_context

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > > +/* This structure is part of the interface between > > lower_rec_simd_input_clauses > > + and lower_rec_input_clauses. */ > > + > > +struct omplow_simd_context { > > + tree idx; > > + tree lane; > > + int max_vf; > > + bool is_simt; > > Any re

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > On Wed, Jan 18, 2017 at 05:52:49PM +0300, Alexander Monakov wrote: > > On Wed, 18 Jan 2017, Jakub Jelinek wrote: > > > Can't it be e.g. recorded inside a flag on the VAR_DECLs or magic > > > attributes > > > o

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > We are talking here about addressable vars, right (so if we turn it into > non-addressable, in the SIMT region we just use the normal PTX pseudos), > right? We could emit inner ={v} {CLOBBER}; before SIMT_EXIT() to make it > clear it shouldn't be moved a

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-18 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > > It is, but I think my approach is compatible with inlining too (and has a > > more > > localized impact on the compiler). > > But your 2/5 patch disables inlining into the SIMT regions. Or do you mean > the approach with some new IFN for the pointers

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-18 Thread Alexander Monakov
Hello Jakub, Sorry for not noticing this earlier, but ... > +#ifdef __LP64__ > +typedef unsigned long long CUdeviceptr; > +#else > +typedef unsigned CUdeviceptr; > +#endif I think this #ifdef doesn't do the right thing on MinGW. Would it be fine to simplify it? In my code I have typedef uint

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Alexander Monakov
On Thu, 19 Jan 2017, Richard Biener wrote: > >> What about motion in the other direction, upwards across SIMT_ENTER()? > > > > I think this is a question for Richard, whether it can be done in the alias > > oracle. If yes, it supposedly can be done for both SIMT_ENTER and > > SIMT_EXIT. > > Code

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Alexander Monakov
On Thu, 19 Jan 2017, Jakub Jelinek wrote: > On Thu, Jan 19, 2017 at 10:45:08AM +0100, Richard Biener wrote: > > > But in the escape analysis we could consider all the specially marked > > > "omp simt private" addressable vars to escape and thus confine them into > > > the > > > SIMT region that wa

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > > Inlining needs to do just like omp-low; if we take the current framework, it > > would need to collect addressable locals into one struct, replace > > references to > > those locals by field references in the inlined body. Then it needs to > > appropr

Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-19 Thread Alexander Monakov
On Wed, 18 Jan 2017, Jakub Jelinek wrote: > On Wed, Jan 18, 2017 at 10:52:32PM +0300, Alexander Monakov wrote: > > Sorry for not noticing this earlier, but ... > > > > > +#ifdef __LP64__ > > > +typedef unsigned long long CUdeviceptr; > > > +#else > >

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-19 Thread Alexander Monakov
On Thu, 19 Jan 2017, Jakub Jelinek wrote: > On Thu, Jan 19, 2017 at 04:36:25PM +0300, Alexander Monakov wrote: > > > One of the problems with that is that it means that you can't easily turn > > > addressable private variables into non-addressable ones once you force

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-25 Thread Alexander Monakov
Hi, Here's a different approach that doesn't introduce indirection for privatized variables at all, and keeps dependencies obvious in the IR, but, on the flip side, requires mentioning all subfields of privatized structures in a few places. For each privatized variable, add it to the list of outp

[PATCHv2 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-03-22 Thread Alexander Monakov
Hello, This patchset implements privatization of addressable variables in OpenMP SIMD regions lowered for SIMT targets (i.e. NVPTX) via the approach identified in the review of the previous submission. Now instead of explicitly privatizing those variables as fields of an allocated struct up front

[PATCH 3/5] omp-offload: implement SIMT privatization, part 2

2017-03-22 Thread Alexander Monakov
This patch implements rewriting of SIMT private variables as fields of a struct by setting DECL_VALUE_EXPR on them and regimplifying statements. * omp-offload.c: Include langhooks.h, tree-nested.h, stor-layout.h. (ompdevlow_adjust_simt_enter): New. (find_simtpriv_var_op): N

[PATCH 1/5] nvptx: implement SIMT enter/exit insns

2017-03-22 Thread Alexander Monakov
This patch adds handling of new omp_simt_enter/omp_simt_exit named insns in the NVPTX backend. * config/nvptx/nvptx-protos.h (nvptx_output_simt_enter): Declare. (nvptx_output_simt_exit): Declare. * config/nvptx/nvptx.c (nvptx_init_unisimt_predicate): Use cfun->machi

[PATCH 2/5] omp-low: implement SIMT privatization, part 1

2017-03-22 Thread Alexander Monakov
This patch adjusts privatization in OpenMP SIMD loops lowered for SIMT targets. At lowering time, private variables receive "omp simt private" attribute, get mentioned in argument list of GOMP_SIMT_ENTER function, and get a clobbering assignment just prior to GOMP_SIMT_EXIT function. The following

[PATCH 5/5] address-taken: optimize SIMT privatized variables

2017-03-22 Thread Alexander Monakov
This patch implements promotion of SIMT private variables if GOMP_SIMT_ENTER is the only remaining statement where their address is taken, by handling it similar to ASAN_MARK. To avoid rebuilding GOMP_SIMT_ENTER statement from scratch, set argument slot to a null pointer when the corresponding var

[PATCH 4/5] tree-inline: implement SIMT privatization, part 3

2017-03-22 Thread Alexander Monakov
This patch implements privatization for SIMT during inlining. We need to discover if the call being inlined belongs to a SIMT region (by looking at simduid of the containing loop), and if so, treat them similar to OpenMP-SIMD privatization: add the "omp simt private" attribute and mention them amo

Re: [PATCH 3/5] omp-offload: implement SIMT privatization, part 2

2017-03-23 Thread Alexander Monakov
On Thu, 23 Mar 2017, Jakub Jelinek wrote: > > + if (vf != 1) > > + continue; > > + unlink_stmt_vdef (stmt); > > This is weird. AFAIK unlink_stmt_vdef just replaces the uses of the vdef > of that stmt with the vuse, but it still keeps the vdef (and vuse) around > on the stmt, t

Re: [PATCH 4/5] tree-inline: implement SIMT privatization, part 3

2017-03-23 Thread Alexander Monakov
On Thu, 23 Mar 2017, Jakub Jelinek wrote: > On Wed, Mar 22, 2017 at 06:46:34PM +0300, Alexander Monakov wrote: > > @@ -4730,6 +4746,25 @@ expand_call_inline (basic_block bb, gimple *stmt, > > copy_body_data *id) > >if (cfun->gimple_df) > > pt_solution_re

Re: [PATCH 4/5] tree-inline: implement SIMT privatization, part 3

2017-03-23 Thread Alexander Monakov
On Thu, 23 Mar 2017, Jakub Jelinek wrote: > > Sorry for missing the IR stability issue. This code relies on dst_simt_vars > > being a set and thus having no duplicate entries (so the implicit lookup > > when > > adding an element is needed). > > > > However, I think I was overly cautious: lookin

Re: [PATCH 4/5] tree-inline: implement SIMT privatization, part 3

2017-03-23 Thread Alexander Monakov
On Thu, 23 Mar 2017, Jakub Jelinek wrote: > And then clear it. That doesn't look like the right thing. > > So either you need some bool variable whether you've actually allocated > the vector in the current expand_call_inline and use that instead of > if (id->dst_simt_vars), or maybe you should c

Re: [PATCH 1/5] nvptx: implement SIMT enter/exit insns

2017-03-27 Thread Alexander Monakov
Hello Bernd, Can you have a look at this patch (unchanged from previous posting in January)? The rest of the patches in the set are reviewed. On Wed, 22 Mar 2017, Alexander Monakov wrote: > This patch adds handling of new omp_simt_enter/omp_simt_exit named insns > in the NVPTX b

Re: [PATCH 2/5] omp-low: implement SIMT privatization, part 1

2017-03-31 Thread Alexander Monakov
Hello Jakub, I've noticed while re-reading that this patch incorrectly handled SIMT case in lower_lastprivate_clauses. The code was changed to look for variables with "omp simt private" attribute, and was left under 'simduid && DECL_HAS_VALUE_EXPR_P (new_var)' condition. This effectively constra

[committed] nvptx: correct -Wformat issue

2017-03-31 Thread Alexander Monakov
Hello, I've applied the following patch as obvious to fix the -Wformat issue pointed out by Thomas Schwinge. * config/nvptx/nvptx.c (nvptx_output_softstack_switch): Correct format string. diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 83f4610..4c35c16 1006

[PATCH] omp-low: fix lastprivate/linear lowering for SIMT

2017-04-07 Thread Alexander Monakov
Ping. > I've noticed while re-reading that this patch incorrectly handled SIMT case > in lower_lastprivate_clauses. The code was changed to look for variables > with "omp simt private" attribute, and was left under > 'simduid && DECL_HAS_VALUE_EXPR_P (new_var)' condition. This effectively > cons

[PATCH] doc: mention handling of {0} in -Wmissing-field-initializers (PR 71250)

2017-04-19 Thread Alexander Monakov
Hi, PR 71250 asks to explicitly document that -Wmissing-field-initializers warning was enhanced some time ago to suppress warnings on uses of the universal zero initializer { 0 } in C language. The documentation already calls out that the warning is silenced in C++ for '{ }', the patch adds the c

Re: [PATCH] omp-low: fix lastprivate/linear lowering for SIMT

2017-04-20 Thread Alexander Monakov
Ping - as this patch addresses a wrong-code issue in new functionality, I'd like to ask if it may be applied to gcc-7 branch too. On Fri, 7 Apr 2017, Alexander Monakov wrote: > Ping. > > > I've noticed while re-reading that this patch incorrectly ha

Re: [PATCH 2/5] omp-low: implement SIMT privatization, part 1

2017-04-20 Thread Alexander Monakov
On Thu, 20 Apr 2017, Jakub Jelinek wrote: > > This wasn't caught in testing, as apparently all testcases that have target > > simd loops with linear/lastprivate clauses also have the corresponding > > variables > > mentioned in target map clause, which makes them addressable (is that > > necessar

Re: [PATCH] Fix PR80533

2017-04-27 Thread Alexander Monakov
On Thu, 27 Apr 2017, Richard Biener wrote: > struct q { int n; long o[100]; }; > struct r { int n; long o[0]; }; > > union { > struct r r; > struct q q; > } u; > > int foo (int i, int j) > { > long *q = u.r.o; > u.r.o[i/j] = 1; > return q[2]; > } > > but nothing convinced schedulin

[PATCH] lra: make reload_pseudo_compare_func a proper comparator

2017-09-15 Thread Alexander Monakov
Hello, I'd like to apply the following LRA patch to make qsort comparator reload_pseudo_compare_func proper (right now it lacks transitivity due to incorrect use of non_reload_pseudos bitmap, PR 68988). This function was originally a proper comparator, and the problematic code was added as a fix

Re: [RFA] Implement __VA_OPT__

2017-09-17 Thread Alexander Monakov
On Sat, 16 Sep 2017, Tom Tromey wrote: > --- a/gcc/doc/cpp.texi > +++ b/gcc/doc/cpp.texi > @@ -1675,20 +1675,27 @@ macro. We could define @code{eprintf} like this, > instead: [snip] > +This formulation looks more descriptive, but historically it was less > +flexible: you had to supply at least on

[PATCH] cp: fix location comparison in member_name_cmp

2017-09-19 Thread Alexander Monakov
Hi, After recent changes, the member_name_cmp qsort comparator can indicate A < B < A (i.e. lacks anti-commutativity) for distinct TYPE_DECL nodes that have the same source location. If their order doesn't matter, the comparator should return 0. Invoking qsort with improper comparator at best ma

[PATCH] haifa-sched: fix autopref_rank_for_schedule qsort comparator

2017-09-19 Thread Alexander Monakov
Hello, The autopref_rank_for_schedule qsort sub-comparator is not actually a proper comparator. For instance, it lacks transitivity: if there's insns A, B, C such that B has AUTOPREF_MULTUPASS_DATA_IRRELEVANT status, but A and C compare such that C < A, we can have A == B == C < A according to th

Re: [PATCH] cp: fix location comparison in member_name_cmp

2017-09-19 Thread Alexander Monakov
On Tue, 19 Sep 2017, Nathan Sidwell wrote: > On 09/19/2017 07:06 AM, Alexander Monakov wrote: > > Hi, > > > > After recent changes, the member_name_cmp qsort comparator can indicate > > A < B < A (i.e. lacks anti-commutativity) for distinct TYPE_DECL nodes >

Re: [PATCH] cp: fix location comparison in member_name_cmp

2017-09-19 Thread Alexander Monakov
On Tue, 19 Sep 2017, Nathan Sidwell wrote: > > > > After recent changes, the member_name_cmp qsort comparator can indicate > > > > A < B < A (i.e. lacks anti-commutativity) for distinct TYPE_DECL nodes > > > > that have the same source location. If their order doesn't matter, the > > > > comparato

Re: [PATCH] haifa-sched: fix autopref_rank_for_schedule qsort comparator

2017-09-19 Thread Alexander Monakov
On Tue, 19 Sep 2017, Maxim Kuvyrkov wrote: > How about the following: > 1. if both instructions are "irrelevant", then return "0". > 2. if one instruction is "relevant" and another is "irrelevant", then > "relevant" instruction is always greater (or lesser) than the non-relevant. > 3. if both inst

Re: [PATCH] haifa-sched: fix autopref_rank_for_schedule qsort comparator

2017-09-19 Thread Alexander Monakov
> I'd like to keep read/write processing balanced. In the above "read" analysis > has greater weight than "write" analysis. Also, autopref_rank_data() should > not be called if !rtx_equal_p (data1->base, data2->base). I'm afraid this doesn't work. Consider you have insns A, B, C such that all a

Re: [PATCH] cp: fix location comparison in member_name_cmp

2017-09-20 Thread Alexander Monakov
On Wed, 20 Sep 2017, Nathan Sidwell wrote: > > You can use the gcc_assert mentioned in the previous email on GCC > > bootstrap/regtest to find examples. For me, the following example breaks > > (no > > command line flags needed, just bare 'cc1plus t.i'): > > > > struct > > { > >int a, b, c, d

[PATCH] toplev: read from /dev/urandom only when needed

2017-09-20 Thread Alexander Monakov
Hi, Most compiler invocations don't actually need an entropy source, so open-read-close syscall sequence on /dev/urandom that GCC performs on each startup is useless (and can easily be avoided). This patch makes GCC read entropy from /dev/urandom lazily on first call to get_random_seed, and en pa

Re: [PATCH] toplev: read from /dev/urandom only when needed

2017-09-21 Thread Alexander Monakov
On Thu, 21 Sep 2017, Jakub Jelinek wrote: > Why isn't init_local_tick done at the get_random_seed time too? > I.e. inlined into get_random_seed by hand like you've done for > init_random_seed? init_local_tick initializes the 'local_tick' global variable that is directly accessed from coverage.c.

[PATCH] tree-sra: fix compare_access_positions qsort comparator

2017-09-21 Thread Alexander Monakov
Hi, The compare_access_positions qsort comparator lacks transitivity, although somewhat surprisingly this issue didn't manifest on 64-bit x86 bootstraps. The first invalid comparison step is here (tree-sra.c:1545): /* Put the integral type with the bigger precision first. */ else if

Re: [PATCH] haifa-sched: fix autopref_rank_for_schedule qsort comparator

2017-09-22 Thread Alexander Monakov
On Tue, 19 Sep 2017, Alexander Monakov wrote: > * haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' insns > first, always call autopref_rank_data otherwise. May I apply this patch now to unblock qsort checking? Further changes or adjustments can then go

Re: [PATCH] tree-sra: fix compare_access_positions qsort comparator

2017-09-25 Thread Alexander Monakov
On Thu, 21 Sep 2017, Richard Sandiford wrote: > LGTM FWIW, but isn't there also the problem that the TYPE_PRECISION > test fails to stabilise the sort if you have two integral types with > the same precision? Yes, but that's a pre-existing issue, so I didn't change it in the patch. I think GCC bro

Re: [PATCH] tree-sra: fix compare_access_positions qsort comparator

2017-09-25 Thread Alexander Monakov
On Mon, 25 Sep 2017, Martin Jambor wrote: > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -1542,19 +1542,20 @@ compare_access_positions (const void *a, const void > *b) > && TREE_CODE (f2->type) != COMPLEX_TYPE > && TREE_CODE (f2->type) != VECTOR_TYPE) > return -

[reviewed] qsort comparator consistency checking

2017-09-29 Thread Alexander Monakov
Hello, I'm going to install the following patch on trunk in the next few hours. This revision doesn't offer per-callsite opt-out anymore as suggested by Richi on the Cauldron (made possible by fixing all known issues on trunk). Thus this patch has a few minor differences compared to the previous r

Re: [reviewed] qsort comparator consistency checking

2017-09-29 Thread Alexander Monakov
On Fri, 29 Sep 2017, Andrew Pinski wrote: > > This patch (r253295) breaks the gcc build for aarch64-linux-gnu: > > I was just about to report the same thing. I think autoprefetch ranking heuristic is still wrong if multi_mem_insn_p may be true; please try this patch. * haifa-sched.c (aut

Re: [PATCH] Fix sort_by_operand_rank with qsort checking (PR tree-optimization/82381)

2017-10-03 Thread Alexander Monakov
On Tue, 3 Oct 2017, Jakub Jelinek wrote: > The qsort cmp transitivity checks may fail with the sort_by_operand_rank > comparator, because if one of the operands has stmt_to_insert and the > other doesn't (but likely also if one SSA_NAME is (D) and the other is not), > we fallthrough into SSA_NAME_V

Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-04 Thread Alexander Monakov
On Wed, 4 Oct 2017, Ramana Radhakrishnan wrote: > However we need a scheduler maintainer or global reviewer to please > help review this patch or help come up with an alternative patch. A > primary platform broken for 5 days with a commit and no public > response from the original poster is really

[PATCH] ira-color: fix allocno_priority_compare_func for qsort (PR 82395)

2017-10-05 Thread Alexander Monakov
Hello, In ira-color.c, qsort comparator allocno_priority_compare_func lacks anti- commutativity and can indicate A < B < A if boths allocnos satisfy non_spilled_static_chain_regno_p. It should fall down to following sub-comparisons in that case. There is another issue: the comment doesn't match

Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-05 Thread Alexander Monakov
On Thu, 5 Oct 2017, Maxim Kuvyrkov wrote: > I'm still working on analysis, but it appears to me that Alexander's patch > (current state of trunk) fails qsort check due to not being symmetric for > load/store analysis (write == 0 or write == 1) in comparisons with > "irrelevant" instructions. Wilco

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Alexander Monakov
Hi, Earlier Richard mentioned the possibility to special-case GOMP_SIMT_ENTER to allow passing privatized variables to it by reference without making them addressable. I now see that such special-casing is already done for IFN_ATOMIC_COMPARE_EXCHANGE in tree-ssa.c: execute_update_addresses_taken

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Alexander Monakov
On Wed, 1 Feb 2017, Jakub Jelinek wrote: > IFN_ASAN_POISON is treated that way too. That also means that if a > variable is previously addressable and the only spot that takes its address > is that IFN, it can be rewritten into SSA form, but the IFN has to be > adjusted to something different whic

Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Alexander Monakov
On Wed, 1 Feb 2017, Jakub Jelinek wrote: > > Yes; I imagine the approach taken in patch 2/5 can be extended to achieve > > this. > > That is, instead of just storing a flag 'bool in_simtreg' in struct loop, > > store > > pointers to corresponding SIMT_ENTER/EXIT gimple statements, use a similar >

Re: [PATCH] Fix exgettext to handle multi-line help texts from *.opt files (PR translation/78745)

2017-02-16 Thread Alexander Monakov
On Thu, 16 Feb 2017, Thomas Schwinge wrote: > On Mon, 9 Jan 2017 17:21:41 +0100, I wrote: > > On Thu, 29 Dec 2016 16:15:01 +0100, Jakub Jelinek wrote: > > > PR translation/78745 > > > * exgettext: Handle multi-line help texts in *.opt files. > > > > With this committed in r243981, I noticed t

Re: [PATCH] Fix exgettext to handle multi-line help texts from *.opt files (PR translation/78745)

2017-02-16 Thread Alexander Monakov
On Thu, 16 Feb 2017, Jakub Jelinek wrote: > On Thu, Feb 16, 2017 at 01:56:15PM +0300, Alexander Monakov wrote: > Are you sure you can't have them in *.c file (e.g. by setting some variable > to a spec string or similar)? > I think it is better to scan all those files. Hm, proba

Re: [gomp4] adjust num_gangs and add a diagnostic for unsupported num_workers

2017-02-17 Thread Alexander Monakov
On Fri, 17 Feb 2017, Cesar Philippidis wrote: > > And then, I don't specifically have a problem with discontinuing CUDA 5.5 > > support, and require 6.5, for example, but that should be a conscious > > decision. > > We should probably ditch CUDA 5.5. In fact, according to trunk's cuda.h, > it requ

Re: Improving code generation in the nvptx back end

2017-02-20 Thread Alexander Monakov
On Fri, 17 Feb 2017, Thomas Schwinge wrote: > On Fri, 17 Feb 2017 14:00:09 +0100, I wrote: > > [...] for "normal" functions there is no reason to use the > > ".param" space for passing arguments in and out of functions. We can > > then get rid of the boilerplate code to move ".param %in_ar*" into

Re: [ptx] debug info

2016-03-09 Thread Alexander Monakov
Hello Nathan, On Wed, 9 Mar 2016, Nathan Sidwell wrote: > I've committed this to trunk, to remove the squashing of debug information. > It appears to function correctly. > > I'd had this patch for a while, but forgot to commit it. The preceding code special-casing response to -gstabs can also be

Re: [ptx] debug info

2016-03-09 Thread Alexander Monakov
On Wed, 9 Mar 2016, Nathan Sidwell wrote: > On 03/09/16 09:55, Alexander Monakov wrote: > > The preceding code special-casing response to -gstabs can also be removed > > after this patch. Should I submit the (trivial) removal patch? > > No. I found that necessary to stop

Re: [ptx] debug info

2016-03-10 Thread Alexander Monakov
On Wed, 9 Mar 2016, Nathan Sidwell wrote: > > Furthermore, this is not useful without support in libgomp/plugin-nvptx.c > > and nvptx-none-run.c (PTX JIT does not propagate lineinfo by default). > > Would you like me to submit patches for those? > > please. Here's the pull request for nvptx-run.c

Re: [ptx] debug info

2016-03-10 Thread Alexander Monakov
On Thu, 10 Mar 2016, Nathan Sidwell wrote: > Hm, something must have changed since I found that sorry neccessary. As I already said in my opening sentence (not quoted in your response), you removed the unnecessary override. This is exactly what lets toplevel code see requested debug format now,

Re: [01/05] Fix PR 64411

2016-03-14 Thread Alexander Monakov
On Mon, 14 Mar 2016, Andrey Belevantsev wrote: > In this case, we get an inconsistency between the sched-deps interface, saying > we can't move an insn writing the si register through a vector insn, and the > liveness analysis, saying we can. The latter doesn't take into account > implicit_reg_pen

Re: [02/05] Fix PR 63384

2016-03-14 Thread Alexander Monakov
On Mon, 14 Mar 2016, Andrey Belevantsev wrote: > Here we're looping because we decrease the counter of the insns we still can > issue on a DEBUG_INSN thus rendering the counter negative. The fix is to not > count debug insns in the corresponding code. The selective scheduling is > known to spoil

Re: [03/05] Fix PR 66660

2016-03-14 Thread Alexander Monakov
On Mon, 14 Mar 2016, Andrey Belevantsev wrote: > We speculate an insn in the PR but we do not make a check for it though we > should. The thing that broke this was the fix for PR 45472. In that pr, we > have moved a volatile insn too far up because we failed to merge the bits > describing its vol

Re: [04/05] Fix PR 69032

2016-03-14 Thread Alexander Monakov
On Mon, 14 Mar 2016, Andrey Belevantsev wrote: > We fail to find the proper seqno for the fresh bookkeeping copy in this PR. > The problem is that in get_seqno_by_preds we are iterating over bb from the > given insn backwards up to the first bb insn. We skip the initial insn when > iterating over

Re: [02/05] Fix PR 63384

2016-03-15 Thread Alexander Monakov
On Tue, 15 Mar 2016, Marek Polacek wrote: > This test fails for me due to > cc1plus: warning: var-tracking-assignments changes selective scheduling Thanks for the heads-up Marek, and sorry for the trouble. Like I said in the adjacent reply, the warning is expected (I didn't realize the testsuite

Re: [02/05] Fix PR 63384

2016-03-15 Thread Alexander Monakov
On Tue, 15 Mar 2016, Andrey Belevantsev wrote: > On 15.03.2016 20:44, Alexander Monakov wrote: > > On Tue, 15 Mar 2016, Marek Polacek wrote: > > > This test fails for me due to > > > cc1plus: warning: var-tracking-assignments changes selective scheduling > > >

[gomp-nvptx 0/7] Various fixes

2016-03-18 Thread Alexander Monakov
error reporting (this is a regression that is also visible on trunk with OpenACC offloading), and patch 4 is a slightly more comprehensive fix to nvptx debuginfo generation. Alexander Monakov (7): libgomp: remove paste error in gomp_team_barrier_wait_end nvptx libgcc: use attribute shared

[gomp-nvptx 1/7] libgomp: remove paste error in gomp_team_barrier_wait_end

2016-03-18 Thread Alexander Monakov
* config/nvptx/bar.c: Remove wrong invocation of gomp_barrier_wait_end from gomp_team_barrier_wait_end. --- libgomp/ChangeLog.gomp-nvptx | 5 + libgomp/config/nvptx/bar.c | 2 -- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/libgomp/config/nvptx/bar.c b/libgo

[gomp-nvptx 2/7] nvptx libgcc: use attribute shared

2016-03-18 Thread Alexander Monakov
* config/nvptx/crt0.c (__nvptx_stacks): Define in C. Use it... (__nvptx_uni): Ditto. (__main): ...here instead of inline asm. * config/nvptx/stacks.c (__nvptx_stacks): Define in C. (__nvptx_uni): Ditto. --- libgcc/ChangeLog.gomp-nvptx | 8 libgcc

[gomp-nvptx 6/7] nvptx backend: change mul.u32 to mul.lo.u32

2016-03-19 Thread Alexander Monakov
Recent testing uncovered that PTX JIT may reject attempts to use 'mul.u32' as a non-widening 32-bit multiply instruction. Use 'mul.lo.u32' to fix 32-bit code generation and conform to the PTX spec better. * config/nvptx/nvptx.c (nvptx_init_unisimt_predicate): Emit 'mul.lo.u32' ins

[gomp-nvptx 3/7] libgomp plugin: make cuMemFreeHost error non-fatal

2016-03-19 Thread Alexander Monakov
Unlike cuMemFree and other resource-releasing functions called on exit, cuMemFreeHost appears to re-report errors encountered in kernel launch. This leads to a deadlock after GOMP_PLUGIN_fatal is reentered. While the behavior on libgomp side is suboptimal (there's no need to call resource-releasin

[gomp-nvptx 4/7] nvptx backend: re-enable line info generation

2016-03-19 Thread Alexander Monakov
* config/nvptx/nvptx.c (nvptx_option_override): Remove custom handling of debug info options. --- gcc/ChangeLog.gomp-nvptx | 5 + gcc/config/nvptx/nvptx.c | 9 - 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/n

[gomp-nvptx 7/7] nvptx backend: define STACK_SIZE_MODE

2016-03-19 Thread Alexander Monakov
Default definition of STACK_SIZE_MODE is word_mode, which is DImode on NVPTX. However, stack pointer mode matches pointer mode, so needs to be SImode on 32-bit NVPTX ABI. Define it to Pmode to fix 32-bit code generation. * config/nvptx/nvptx.h (STACK_SIZE_MODE): Define. --- gcc/ChangeLog

[gomp-nvptx 5/7] nvptx backend: use POINTER_SIZE instead of BITS_PER_WORD

2016-03-20 Thread Alexander Monakov
POINTER_SIZE is the proper macro to retrieve pointer size in bits for the target ABI, but new code incorrectly used BITS_PER_WORD, breaking 32-bit code generation. * config/nvptx/nvptx.c (nvptx_init_unisimt_predicate): Use POINTER_SIZE instead of BITS_PER_WORD. (nvptx_decla

Re: [PATCH 1/4, libgomp] Resolve deadlock on plugin exit

2016-03-21 Thread Alexander Monakov
Hi, I'd like to note that I have a small patch on gomp-nvptx branch that deals with the worst user-visible regression in a non-intrusive manner: https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01109.html Alexander

Re: out of bounds access in insn-automata.c

2016-03-24 Thread Alexander Monakov
Hi, On Thu, 24 Mar 2016, Bernd Schmidt wrote: > On 03/24/2016 11:17 AM, Aldy Hernandez wrote: > > On 03/23/2016 10:25 AM, Bernd Schmidt wrote: > > > It looks like this block of code is written by a helper function that is > > > really intended for other purposes than for maximal_insn_latency. Migh

[gomp-nvptx 2/2] libgomp: avoid triggering a driver bug on sm_50

2016-03-24 Thread Alexander Monakov
Loops lacking exit edges can trigger an NVIDIA driver sm_50 code generation bug, which manifested as stack pointer (SASS register R1) corruption in this case. Adjusting source by hand to arrange a cheap exit branch seems to be the most reasonable workaround. NVIDIA bug ID 200177879. * con

[gomp-nvptx 0/2] gomp_nvptx_main tweaks

2016-03-24 Thread Alexander Monakov
I have committed two nvptx libgomp tweaks to amonakov/gomp-nvptx branch, one to improve efficiency, another to workaround a Maxwell-specific driver bug. Alexander Monakov (2): libgomp: avoid malloc calls in gomp_nvptx_main libgomp: avoid triggering a driver bug on sm_50 libgomp

[gomp-nvptx 1/2] libgomp: avoid malloc calls in gomp_nvptx_main

2016-03-24 Thread Alexander Monakov
Avoid calling malloc where it's easy to use stack storage instead: device malloc is very slow in CUDA. This cuts about 60-80 microseconds from target region entry/exit time, slimming down empty target regions from ~95 to ~17 microseconds (as measured on a GTX Titan). * config/nvptx/target

[PATCH] nvptx backend: fix and streamline symbol renaming

2016-03-31 Thread Alexander Monakov
This fixes a bug in the NVPTX backend where taking the address of a function renamed by the backend (e.g. 'call' or 'malloc') would wrongly use the original name. Now all decl renaming is handled up front via TARGET_MANGLE_DECL_ASSEMBLER_NAME hook, which becomes the only caller of nvptx_name_replac

[committed] nvptx: fix -moptimize help text

2016-04-15 Thread Alexander Monakov
Hello, I have committed to trunk as obvious the following patch to add a missing period at the end of help text for the '-moptimize' NVPTX backend option. Alexander * config/nvptx/nvptx.opt (moptimize): Add a period at end of help text. --- gcc/config/nvptx/nvptx.opt +++ gcc/config/nvpt

Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-15 Thread Alexander Monakov
On Fri, 15 Apr 2016, Michael Matz wrote: > On Thu, 14 Apr 2016, Maxim Kuvyrkov wrote: > > > It appears that implementing -fprolog-pad=N option in GCC will not > > enable kernel live-patching support for AArch64. The proposal for the > > option was to make GCC output a given number of NOPs at th

Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-17 Thread Alexander Monakov
On Fri, 15 Apr 2016, Alexander Monakov wrote: > On Fri, 15 Apr 2016, Michael Matz wrote: > > Replace first nop with a breakpoint, handle rest of patching in breakpoint > > handler, patch breakpoint insn last, no need to atomically patch multiple > > instructions. >

Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-18 Thread Alexander Monakov
On Thu, 14 Apr 2016, Szabolcs Nagy wrote: > looking at [2] i don't see why > > func: > mov x9, x30 > bl _tracefunc > > > is not good for the kernel. > > mov x9, x30 is a nop at function entry, so in > theory 4 byte atomic write should be enough > to enable/disable tracing. Overwriting x9

Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-18 Thread Alexander Monakov
On Mon, 18 Apr 2016, Ramana Radhakrishnan wrote: > On Mon, Apr 18, 2016 at 2:26 PM, Alexander Monakov wrote: > > On Thu, 14 Apr 2016, Szabolcs Nagy wrote: > >> looking at [2] i don't see why > >> > >> func: > >> mov x9, x30 > >>

Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-18 Thread Alexander Monakov
On Mon, 18 Apr 2016, Ramana Radhakrishnan wrote: > > - and GCC is not smart enough to be aware that intra-TU calls to 'func' (the > > function we're instrumenting) don't touch x16/x17. And GCC should be that > > smart, if it's not, it's a bug, right? :) > > > > That it already is - IIRC. Oth

Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-18 Thread Alexander Monakov
On Mon, 18 Apr 2016, Szabolcs Nagy wrote: > On 18/04/16 14:26, Alexander Monakov wrote: > > On Thu, 14 Apr 2016, Szabolcs Nagy wrote: > >> looking at [2] i don't see why > >> > >> func: > >> mov x9, x30 > >> bl _tracefunc > >>

[gomp-nvptx] nvptx backend: write_omp_entry cosmetics

2016-04-18 Thread Alexander Monakov
This brings write_omp_entry code a bit closer in style to the rest of nvptx.c by using write_fn_marker, and hopefully makes it a bit clearer. No functional change. * config/nvptx/nvptx.c (write_omp_entry): Adjust. (nvptx_declare_function_name): Adjust. --- Applied to amonakov/gomp-

Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-18 Thread Alexander Monakov
On Tue, 19 Apr 2016, AKASHI Takahiro wrote: > > > But if Szabolcs' two-instruction > > > sequence in the adjacent subthread is sufficient, this is moot. > > > > . It can also be solved by having just one NOP after the function label, > > and a number of them before, then no thread can be in the

Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-18 Thread Alexander Monakov
On Tue, 19 Apr 2016, AKASHI Takahiro wrote: > > looking at [2] i don't see why > > > > func: > > mov x9, x30 > > bl _tracefunc > > > > Actually, > mov x9, x30 > bl _tracefunc > mov x30, x9 > I think here Szabolcs' point was that the last instruction can be eliminated: _tr

[gomp-nvptx] doc: document nvptx shared attribute

2016-04-19 Thread Alexander Monakov
* doc/extend.texi (Nvidia PTX Variable Attributes): New section. --- Applied to amonakov/gomp-nvptx branch. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index e11ce4d..5eeb179 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -5469,6 +5469,7 @@ attributes. * MeP Vari

Re: gomp_target_fini

2016-04-19 Thread Alexander Monakov
On Tue, 19 Apr 2016, Thomas Schwinge wrote: > Well, I certainly had done at least some thinking before proposing this: > we're talking about the libgomp "fatal exit" function, called when > something has gone very wrong, and we're about to terminate the process, > because there's no hope to recover

OpenMP offloading to NVPTX: backend patches

2016-04-20 Thread Alexander Monakov
Hello! In responses to this email, I'll be posting 9 NVPTX-specific patches that are required for enabling OpenMP offloading. I intend to post corresponding libgomp and middle-end changes that make these useful a bit later. The patches are generated by taking a diff on amonakov/gomp-nvptx git br

[PATCH] new patterns for OpenMP SIMD-via-SIMT

2016-04-20 Thread Alexander Monakov
trivially folded when compiling for non-SIMT execution; otherwise they are kept, and expanded into these insns. Previously posted here: [gomp-nvptx 01/13] nvptx backend: new patterns for OpenMP SIMD-via-SIMT https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01550.html 2016-01-17 Alexander Monakov

[PATCH] new target hook: TARGET_SIMT_VF

2016-04-20 Thread Alexander Monakov
ET_SIMT_VF https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00122.html 2015-12-09 Alexander Monakov * config/nvptx/nvptx.c (nvptx_simt_vf): New. (TARGET_SIMT_VF): Define. * doc/tm.texi: Regenerate. * doc/tm.texi.in: (TARGET_SIMT_VF): New hook. * target.def:

[PATCH] add support for placing variables in shared memory

2016-04-20 Thread Alexander Monakov
ptx shared attribute https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00940.html 2016-04-19 Alexander Monakov * doc/extend.texi (Nvidia PTX Variable Attributes): New section. 2016-01-17 Alexander Monakov * config/nvptx/nvptx.c (nvptx_encode_section_info): Hand

<    2   3   4   5   6   7   8   9   10   11   >