[PATCH 1/8] nvptx -msoft-stack

2016-06-09 Thread Alexander Monakov
This is a respin of the recently reviewed -msoft-stack patch that addresses review feedback and adds required libgcc changes. gcc/: * config/nvptx/nvptx-protos.h (nvptx_output_set_softstack): Declare. * config/nvptx/nvptx.c: (need_softstack_decl): New variable. (init_softst

[PATCH 8/8] nvptx: handle OpenMP "omp target entrypoint"

2016-06-09 Thread Alexander Monakov
This patch implements emission of OpenMP target region entrypoints: the compiler emits the target function with '$impl' appended to the name, and under the original name it emits a short entry sequence that sets up shared memory arrays and calls the target function via 'gomp_nvptx_main' (which is i

[PATCH 3/8] nvptx -muniform-simt

2016-06-09 Thread Alexander Monakov
This patch implements -muniform-simt code generation option, which is used to emit code for OpenMP offloading. The goal is to emit code that can either execute "normally", or can execute in a way that keeps all lanes in a given warp active, their local state synchronized, and observable effects fr

Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack)

2016-06-09 Thread Alexander Monakov
On Thu, 9 Jun 2016, Nathan Sidwell wrote: > > (define_expand "restore_stack_block" > >[(match_operand 0 "register_operand" "") > > (match_operand 1 "register_operand" "")] > > you've not addressed my previous comments about this. To be clear -- do you mean that "restore_stack_block" shou

Re: Remove match.pd pattern dups in favor of using :c

2016-06-10 Thread Alexander Monakov
On Wed, 1 Jun 2016, Richard Biener wrote: > > On Wed, 1 Jun 2016, Richard Biener wrote: > > > 2016-06-01 Richard Biener > > > > > > * match.pd ((A & B) - (A & ~B) -> B - (A ^ B)): Add missing :c. > > > (relational patterns): Use :c to avoid pattern duplications. > > > > Should the same tre

Re: [PATCH 3/8] nvptx -muniform-simt

2016-06-13 Thread Alexander Monakov
On Sun, 12 Jun 2016, Sandra Loosemore wrote: > On 06/09/2016 10:53 AM, Alexander Monakov wrote: > > +@item -muniform-simt > > +@opindex muniform-simt > > +Generate code that allows to keep all lanes in each warp active, even when > > Allows *what* to keep? E.g. wha

Re: [PATCH, IA64, RFT]: Implement PR 71242, Missing built-in functions for float128 NaNs

2016-06-16 Thread Alexander Monakov
Hi, > 2016-06-12 Uros Bizjak > > PR target/71242 > * config/ia64/ia64.c (enum ia64_builtins) [IA64_BUILTIN_NANQ]: New. > [IA64_BUILTIN_NANSQ]: Ditto. > (ia64_fold_builtin): New function. > (TARGET_FOLD_BUILTIN): New define. > (ia64_init_builtins) Declare const_string_ty

Re: [PATCH] Handle undefined extern vars in output_in_order

2016-06-16 Thread Alexander Monakov
On Thu, 9 Jun 2016, Alexander Monakov wrote: > Hi, > > This patch teaches cgraphunit.c:output_in_order to output undefined external > variables via assemble_undefined_decl. At the moment that is only done for > -ftoplevel-reorder in varpool.c:symbol_table::output_variables. Thi

Re: [PATCH] Handle undefined extern vars in output_in_order

2016-06-16 Thread Alexander Monakov
On Thu, 16 Jun 2016, Jan Hubicka wrote: > > On Thu, 9 Jun 2016, Alexander Monakov wrote: > + FOR_EACH_VARIABLE (pv) [snip] > + i = pv->order; > + gcc_assert (nodes[i].kind == ORDER_UNDEFINED); > + nodes[i].kind = pv->definition ? ORDER_VAR : ORDER_VAR_UNDE

Re: [PATCH] Handle undefined extern vars in output_in_order

2016-06-16 Thread Alexander Monakov
On Thu, 16 Jun 2016, Jan Hubicka wrote: > I see, order is created at a time variable is added to symbol table (not at > time when definition is given). So we should have order everywhere. > Patch is OK Thanks! If you don't mind a quick followup question: now that both FOR_EACH_VARIABLE loops in

Re: [PATCH 3/8] nvptx -muniform-simt

2016-06-22 Thread Alexander Monakov
Ping. On Mon, 13 Jun 2016, Alexander Monakov wrote: > On Sun, 12 Jun 2016, Sandra Loosemore wrote: > > On 06/09/2016 10:53 AM, Alexander Monakov wrote: > > > +@item -muniform-simt > > > +@opindex muniform-simt > > > +Generate code that allows to keep

Re: [PATCH] Handle undefined extern vars in output_in_order

2016-06-23 Thread Alexander Monakov
Hi, I've discovered that this assert in my patch was too restrictive: + if (DECL_HAS_VALUE_EXPR_P (pv->decl)) + { + gcc_checking_assert (lookup_attribute ("omp declare target link", +DECL_ATTRIBUTES (pv->decl))); Testing for the

Re: [PATCH] PR middle-end/71524: IFUNC resolver may resolve to a non-local function

2016-06-26 Thread Alexander Monakov
On Sat, 25 Jun 2016, H.J. Lu wrote: > The resolver for ifunc functions might resolve to a non-local function. I think the explanation doesn't match the testcase, in which all three functions: the resolver, the symbol being resolved, and the ultimate resolution are all static. I don't think there w

Re: [PATCH] PR middle-end/71524: IFUNC resolver may resolve to a non-local function

2016-06-26 Thread Alexander Monakov
On Sun, 26 Jun 2016, H.J. Lu wrote: > On Sun, Jun 26, 2016 at 12:49 AM, Alexander Monakov > wrote: > > On Sat, 25 Jun 2016, H.J. Lu wrote: > >> The resolver for ifunc functions might resolve to a non-local function. > > > > I think the explanation doesn't

Re: [PATCH] ira-color: fix allocno_priority_compare_func for qsort (PR 82395)

2017-10-19 Thread Alexander Monakov
Ping. On Thu, 5 Oct 2017, Alexander Monakov wrote: > In ira-color.c, qsort comparator allocno_priority_compare_func lacks anti- > commutativity and can indicate A < B < A if boths allocnos satisfy > non_spilled_static_chain_regno_p. It should fall down to following > sub-compar

Re: [RFC PATCH] Coalesce host to device transfers in libgomp

2017-10-24 Thread Alexander Monakov
On Tue, 24 Oct 2017, Jakub Jelinek wrote: > loop transfering the addresses or firstprivate_int values to the device > - where we issued mapnum host2dev transfers each just pointer-sized > when we could have just prepared all the pointers in an array and host2dev > copy them all together. Can you p

Re: [RFC PATCH] Coalesce host to device transfers in libgomp

2017-10-24 Thread Alexander Monakov
On Tue, 24 Oct 2017, Jakub Jelinek wrote: > > Why did you chose the 32KB and 4KB limits? I wonder if that would have > > any impact on firstprivate_int values. If this proves to be effective, > > it seems like we should be able to eliminate GOMP_MAP_FIRSTPRIVATE_INT > > altogether. > > The thing i

Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-07-11 Thread Alexander Monakov
On Thu, 8 Jun 2017, Alexander Monakov wrote: > Ping^3. Ping^4: https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00782.html This is a wrong-code issue with C11 atomics: even if no machine barrier is needed for a given fence type on this architecture, a compiler barrier must be present in

[PATCH] match.pd: reassociate multiplications with constants

2017-07-13 Thread Alexander Monakov
Hi, This is a followup to https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01545.html Recently due to a fix for PR 80800 GCC has lost the ability to reassociate signed multiplications chains to go from 'X * CST1 * Y * CST2' to 'X * Y * (CST1 * CST2)'. The fix to that PR prevents extract_muldiv from

Re: [PATCH] match.pd: reassociate multiplications with constants

2017-07-13 Thread Alexander Monakov
On Thu, 13 Jul 2017, Marc Glisse wrote: > I notice that we do not turn (X*10)*10 into X*100 in GIMPLE. Sorry, could you clarify what you mean here? I think we certainly do that, just not via match.pd, but in 'associate:' case of fold_binary_loc. > Relying on inner expressions being folded can be

Re: [PATCH] match.pd: reassociate multiplications with constants

2017-07-15 Thread Alexander Monakov
On Thu, 13 Jul 2017, Marc Glisse wrote: > I notice that we do not turn (X*10)*10 into X*100 in GIMPLE [...] I've completely missed that. Posting another patch to address that. > Relying on inner expressions being folded can be slightly dangerous, > especially for generic IIRC. It seems easy enou

Re: [PATCH] match.pd: reassociate multiplications with constants

2017-07-15 Thread Alexander Monakov
On Thu, 13 Jul 2017, Marc Glisse wrote: > X*big*big where abs(big*big)>abs(INT_MIN) can be optimized to 0 I'm not sure that would be a win, eliminating X prevents the compiler from deducing that X must be zero (if overflow invokes undefined behavior). > the only hard case is when the product of t

[PATCH 3/6] lra-assigns.c: fix pseudo_compare_func

2017-07-15 Thread Alexander Monakov
This comparator lacks anti-commutativity and can indicate A < B < A if both A and B satisfy non_spilled_static_chain_regno_p. Proceed to following tie-breakers in that case. (it looks like the code incorrectly assumes that at most one register in the array will satisfy non_spilled_static_chain_reg

[PATCH 5/6] haifa-sched.c: give up qsort checking when autoprefetch heuristic is in use

2017-07-15 Thread Alexander Monakov
The autopref_rank_for_schedule sub-comparator and its subroutine autopref_rank_data lack transitivity. Skip checking if they are in use. This heuristic is disabled by default everywhere except ARM and AArch64, so on other targets this does not suppress checking all the time. * haifa-sch

[PATCH 2/6] gimple-ssa-store-merging.c: fix sort_by_bitpos

2017-07-15 Thread Alexander Monakov
This qsort comparator lacks anti-commutativity and can indicate A < B < A if A and B have the same bitpos. Return 0 in that case. * gimple-ssa-store-merging.c (sort_by_bitpos): Return 0 on equal bitpos. --- gcc/gimple-ssa-store-merging.c | 6 +++--- 1 file changed, 3 insertions(+), 3 del

[PATCH 1/6] tree-vrp: fix compare_assert_loc qsort comparator

2017-07-15 Thread Alexander Monakov
Subtracting values to produce a -/0/+ comparison value only works when original values have limited range. Otherwise it leads to broken comparator that indicates 0 < 0x4000 < 0x8000 < 0. Yuri posted an equivalent patch just a few hours ago: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00

[PATCH 0/6] qsort comparator consistency fixes

2017-07-15 Thread Alexander Monakov
Hello, (previous thread here: https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00944.html ) we still have a few places in GCC where the comparator function passed to qsort is not actually a proper sorting predicate. Most commonly it fails to impose total ordering by lacking transitivity. It's usef

[PATCH 6/6] qsort comparator consistency checking

2017-07-15 Thread Alexander Monakov
This is the updated qsort comparator verifier. Since we have vec::qsort(cmp), the patch uses the macro argument counting trick to redirect only the four-argument invocations of qsort to qsort_chk. I realize that won't win much sympathies, but a patch doing mass-renaming of qsort in the whole GCC c

[PATCH 4/6] lra-assigns.c: give up on qsort checking in assign_by_spills

2017-07-15 Thread Alexander Monakov
The reload_pseudo_compare_func comparator, when used from assign_by_spills, can be non-transitive, indicating A < B < C < A if both A and C satisfy !bitmap_bit_p (&non_reload_pseudos, rAC), but B does not. This function was originally a proper comparator, and the problematic clause was added to fi

Re: [PATCH] match.pd: reassociate multiplications with constants

2017-07-17 Thread Alexander Monakov
On Mon, 17 Jul 2017, Marc Glisse wrote: > > +/* Combine successive multiplications. Similar to above, but handling > > + overflow is different. */ > > +(simplify > > + (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2) > > + (with { > > + bool overflow_p; > > + wide_int mul = wi::mul (@1, @2, TYP

Re: [PATCH] match.pd: reassociate multiplications with constants

2017-07-18 Thread Alexander Monakov
On Mon, 17 Jul 2017, Alexander Monakov wrote: > On Mon, 17 Jul 2017, Marc Glisse wrote: > > > +/* Combine successive multiplications. Similar to above, but handling > > > + overflow is different. */ > > > +(simplify > > > + (mult (mult @0 INTE

Re: [PATCH] match.pd: reassociate multiplications with constants

2017-07-19 Thread Alexander Monakov
On Wed, 19 Jul 2017, Richard Biener wrote: > >> --- a/gcc/match.pd > >> +++ b/gcc/match.pd > >> @@ -283,6 +283,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >> || mul != wi::min_value (TYPE_PRECISION (type), SIGNED)) > >> { build_zero_cst (type); }) > >> > >> +/* Combine successiv

Re: [PATCH v2] Add no_tail_call attribute

2017-07-19 Thread Alexander Monakov
On Wed, 19 Jul 2017, Jeff Law wrote: > > Glibc people were worried that attribute would be lost when taking a > > pointer to function > > (https://sourceware.org/ml/libc-alpha/2017-01/msg00482.html). I think > > their reasoning was that return address is a shadow argument for > > dlsym-like functio

Re: [PATCH v2] Add no_tail_call attribute

2017-07-19 Thread Alexander Monakov
On Wed, 19 Jul 2017, Jakub Jelinek wrote: > > 1) recognize dlsym by name and suppress tailcalls to it > > > >this would solve >99% cases because calling dlsym by pointer would be > > rare, > >and has the benefit of not requiring libc header changes; > > Recognizing by name is IMNSHO unde

Re: [PATCH v2] Add no_tail_call attribute

2017-07-19 Thread Alexander Monakov
On Wed, 19 Jul 2017, Yuri Gribov wrote: > So to reiterate, your logic here is that someone would wipe dlsym type > (e.g. by casting to void *), then later cast to another type which > lacks tailcall attribute. So proposed solution won't protect against > situation like this. No, it's not "my logic

Re: [PATCH v2] Add no_tail_call attribute

2017-07-19 Thread Alexander Monakov
On Wed, 19 Jul 2017, Alexander Monakov wrote: > > The one and only advantage of attribute compared to Jakubs approach > > (or yours, they share the same idea of wrapping dlsym calls) is that > > it forces user to carry it around when taking address of function. > > It

Re: [PATCH] match.pd: reassociate multiplications with constants

2017-07-20 Thread Alexander Monakov
On Thu, 20 Jul 2017, Richard Biener wrote: > >> So for saturating types isn't the issue when @1 and @2 have opposite > >> sign and the inner multiply would have saturated? > > > > No, I think the only special case is @1 == @2 == -1, otherwise either @2 is > > 0 or 1, or @1 * @2 is larger in magnitu

[PATCH] toplev: avoid recursive emergency_dump_function

2017-07-20 Thread Alexander Monakov
Hi, Segher pointed out on IRC that ICE reporting with dumps enabled got worse: if emergency_dump_function itself leads to an ICE (e.g. by segfaulting), nested ICE reporting will invoke emergency_dump_function in exactly the same context, but not only would we uselessly announce current pass again,

[PATCH 7/6] fortran: fix pair_cmp qsort comparator

2017-07-21 Thread Alexander Monakov
Hello, The final tie-breaker in pair_cmp comparator looks strange, it correctly yields zero for equal expr->symtree-n.sym values, but for unequal values it produces 0 or 1. This would be correct for C++ STL-style comparators that require "less-than" predicate to be computed, but not for C qsort.

[PATCH v2 1/2] match.pd: reassociate multiplications

2017-07-21 Thread Alexander Monakov
Previous revision here: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00889.html Reassociate (X * CST) * Y to (X * Y) * CST, this pushes constants in multiplication chains to outermost factors, where they can be combined. Changed in this revision: - remove !TYPE_OVERFLOW_SANITIZED and !TYPE_SATUR

[PATCH v2 2/2] combine successive multiplications by constants

2017-07-21 Thread Alexander Monakov
Previous revision here: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01090.html Reassociate X * CST1 * CST2 to X * (CST1 * CST2). Changed in this revision: - remove the check for @2 being 0 or -1 * match.pd ((X * CST1) * CST2): Simplify to X * (CST1 * CST2). testsuite: * gcc.dg/

[PATCH] Optimize BB sorting in domwalk

2017-07-24 Thread Alexander Monakov
Profiling uses of qsort in GCC reveals that a significant portion of calls comes from domwalk.c where child nodes in dominator tree are reordered according to postorder numbering. However we know that in dominator trees the vast majority of nodes have 0, 2 or 3 children, and handling those cases s

Re: [PATCH 2/6] gimple-ssa-store-merging.c: fix sort_by_bitpos

2017-07-24 Thread Alexander Monakov
On Sat, 22 Jul 2017, Segher Boessenkool wrote: > On Sat, Jul 15, 2017 at 11:47:45PM +0300, Alexander Monakov wrote: > > --- a/gcc/gimple-ssa-store-merging.c > > +++ b/gcc/gimple-ssa-store-merging.c > > @@ -516,12 +516,12 @@ sort_by_bitpos (const void *x, const void *y) > &

Re: [PATCH] Optimize BB sorting in domwalk

2017-07-25 Thread Alexander Monakov
On Mon, 24 Jul 2017, Jeff Law wrote: > As Uli noted, we should be using std::swap. > > Can you please repost ? * domwalk.c (cmp_bb_postorder): Simplify. (sort_bbs_postorder): New function. Use it... (dom_walker::walk): ...here to optimize common cases. --- gcc/domwalk.c

Re: [PATCH] Optimize BB sorting in domwalk

2017-07-25 Thread Alexander Monakov
On Tue, 25 Jul 2017, Alexander Monakov wrote: > --- a/gcc/domwalk.c > +++ b/gcc/domwalk.c > @@ -128,19 +128,46 @@ along with GCC; see the file COPYING3. If not see > which is currently an abstraction over walking tree statements. Thus > the dominator walker is currently

Re: [PATCH 2/6] gimple-ssa-store-merging.c: fix sort_by_bitpos

2017-07-25 Thread Alexander Monakov
On Tue, 25 Jul 2017, Kyrill Tkachov wrote: > For the uses of this function the order when the bitpos is the same > does not matter, I just wanted to avoid returning zero to avoid perturbations > due to qsort. But you can't stabilize qsort in that manner, in fact by making the comparator invalid yo

Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-07-26 Thread Alexander Monakov
On Wed, 26 Jul 2017, Jeff Law wrote: > So I think this is up to the target maintainers. I have no concerns > with enabling use of expand_asm_memory_barrier to be used outside of > optabs. So if the s390/x86 maintainers want to go forward, the optabs > changes are pre-approved. Please see the alt

Re: [PATCH] toplev: avoid recursive emergency_dump_function

2017-07-26 Thread Alexander Monakov
On Sat, 22 Jul 2017, Segher Boessenkool wrote: > On Thu, Jul 20, 2017 at 05:40:28PM +0300, Alexander Monakov wrote: > > Segher pointed out on IRC that ICE reporting with dumps enabled got worse: > > if emergency_dump_function itself leads to an ICE (e.g. by segfaulting), > >

Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-07-26 Thread Alexander Monakov
On Wed, 26 Jul 2017, Jeff Law wrote: > I'm not sure what you mean by extraneous compiler barriers -- isn't the > worst case scenario here that the target emits them as well? So there > would be an extraneous one in that case, but that ought to be a "don't > care". Yes, exactly this. > In the mid

Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-07-26 Thread Alexander Monakov
On Wed, 26 Jul 2017, Alexander Monakov wrote: > On Wed, 26 Jul 2017, Jeff Law wrote: > > I'm not sure what you mean by extraneous compiler barriers -- isn't the > > worst case scenario here that the target emits them as well? So there > > would be an extraneous o

Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-07-31 Thread Alexander Monakov
On Mon, 31 Jul 2017, Jeff Law wrote: > >> In the middle end patch, do we need a barrier before the fence as well? > >> The post-fence barrier prevents reordering the fence with anything which > >> follows the fence. But do we have to also prevent reordering the fence > >> with prior instructions w

Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-07-31 Thread Alexander Monakov
On Mon, 31 Jul 2017, Jeff Law wrote: > > Please consider that expand_mem_thread_fence is used to place fences around > > seq-cst atomic loads&stores when the backend doesn't provide a direct > > pattern. > > With compiler barriers on both sides of the machine barrier, the generated > > sequence fo

Re: [PATCH 6/6] qsort comparator consistency checking

2017-07-31 Thread Alexander Monakov
On Mon, 31 Jul 2017, Jeff Law wrote: > I must have missed something. Can't you just define > > qsort (BASE, NMEMB, SIZE, COMPARE) into > > qsort_chk (BASE, NMEMB, SIZE, COMPARE) > > That shouldn't affect the qsort from vec? Right? Or am I missing something If you do #define qsort(base, n

Re: [PATCH] toplev: avoid recursive emergency_dump_function

2017-08-02 Thread Alexander Monakov
Hello, On Thu, 20 Jul 2017, Alexander Monakov wrote: > Segher pointed out on IRC that ICE reporting with dumps enabled got worse: > if emergency_dump_function itself leads to an ICE (e.g. by segfaulting), > nested ICE reporting will invoke emergency_dump_function in exactly the >

[PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier

2017-08-02 Thread Alexander Monakov
As recently discussed in the previous thread for PR 80640, some targets have sufficiently strong hardware memory ordering that implementation of C11 atomic fences might not need machine barriers. However, a compiler memory barrier nevertheless must be present, and at least two targets (x86, s390)

[PATCH 3/3] optabs: ensure atomic_load/stores have compiler barriers

2017-08-02 Thread Alexander Monakov
Again, like in patch 1/3, a backend might expand C11 atomic load/store to a volatile memory access if hardware memory ordering is sufficiently strong that no machine barrier is required. Nevertheless, we must ensure that compiler memory barrier(s) are present before/after the access to prevent wro

[PATCH 2/3] retire mem_signal_fence pattern

2017-08-02 Thread Alexander Monakov
Similar to mem_thread_fence issue from the patch 1/3, RTL representation of __atomic_signal_fence must be a compiler barrier. We have just one backend offering this pattern, and it does not place a compiler barrier. It does not appear useful to expand signal_fence to some kind of hardware instruc

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-02 Thread Alexander Monakov
On Wed, 2 Aug 2017, Jeff Law wrote: > Well, there's not *that* many qsort calls. My quick grep shows 94 and > its a very mechanical change. Then a poison in system.h to ensure raw > calls to qsort don't return. Any suggestion for the non-poisoned replacement? xqsort? gcc_qsort? Can you review

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Alexander Monakov
On Wed, 2 Aug 2017, Jeff Law wrote: > >> Well, there's not *that* many qsort calls. My quick grep shows 94 and > >> its a very mechanical change. Then a poison in system.h to ensure raw > >> calls to qsort don't return. Note that poisoning qsort outlaws vec::qsort too; it would need to be mass-

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Alexander Monakov
On Thu, 3 Aug 2017, Jakub Jelinek wrote: > Do we really need to rename and poison anything? qsort () in the source is > something that is most obvious to developers, so trying to force them to use > something different will just mean extra thing to learn. Yep, I'd prefer to have a solution that k

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Alexander Monakov
On Fri, 4 Aug 2017, Oleg Endo wrote: > > Note that with vec::qsort -> vec::sort renaming (which should be less > > controversial, STL also has std::vector::sort) > > No it doesn't?  One uses std::sort from on a pair of random > access iterators to sort a std::vector. My mistake, but the main poi

Re: [PATCH] Simplify pow with constant

2017-08-04 Thread Alexander Monakov
On Fri, 4 Aug 2017, Wilco Dijkstra wrote: > This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C). I don't think you can do that for non-positive C. > Do this only for fast-math as accuracy is reduced. This is much faster > since pow is more complex than exp - with a current GLI

Re: [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier

2017-08-07 Thread Alexander Monakov
On Sat, 5 Aug 2017, Richard Sandiford wrote: > It would be simpler to test whether targetm.gen_mem_thread_fence > returns NULL. > > This feels a bit hacky though. Checking whether a generator produced no > instructions is usually the test for whether the generator FAILed, which > should normally

Re: [PATCH] i386: Don't use frame pointer without stack access

2017-08-07 Thread Alexander Monakov
On Mon, 7 Aug 2017, Michael Matz wrote: > > I am looking for a run-time test which breaks unwinder. > > I don't have one handy. Idea: make two threads, one endlessly looping in > the "frame-less" function, the other causing a signal to the first thread, > and the signal handler checking that un

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-10 Thread Alexander Monakov
On Wed, 9 Aug 2017, Jeff Law wrote: > >> The _5th macro isn't that bad either, appart from using reserved namespace > >> identifiers (it really should be something like qsort_5th and the arguments > >> shouldn't start with underscores). > > > > I didn't understand what Jeff found "ugly" about it;

Re: [PATCH v2] Simplify pow with constant

2017-08-17 Thread Alexander Monakov
On Thu, 17 Aug 2017, Wilco Dijkstra wrote: > This patch simplifies pow (C, x) into exp (x * C1) if C > 0, C1 = log (C). Note this changes the outcome for C == +Inf, x == 0 (pow is specified to return 1.0 in that case, but x * C1 == NaN). There's another existing transform with the same issue, 'p

Re: [PATCH] correct documentation of attribute ifunc (PR 81882)

2017-08-17 Thread Alexander Monakov
On Thu, 17 Aug 2017, Martin Sebor wrote: > returns a pointer to the selected implementation function. The > implementation functions' declarations must match the API of the > -function being implemented, the resolver's declaration is be a > -function returning pointer to void function returning

[PING][PATCH 2/3] retire mem_signal_fence pattern

2017-08-28 Thread Alexander Monakov
Ping (for this and patch 3/3 in the thread). On Wed, 2 Aug 2017, Alexander Monakov wrote: > Similar to mem_thread_fence issue from the patch 1/3, RTL representation of > __atomic_signal_fence must be a compiler barrier. We have just one backend > offering this pattern, and it does no

[PATCH] ira-costs: avoid missing base registers in record_address_regs

2017-08-28 Thread Alexander Monakov
Hello, The code in record_address_regs shown in the following patch assumes that if a given target cannot have two registers in a memory address, then the sole register, if present, must be the leftmost operand in the PLUS chain. I think this is not true if the target uses unspecs to signify spec

Re: [PING][PATCH 2/3] retire mem_signal_fence pattern

2017-08-31 Thread Alexander Monakov
On Thu, 31 Aug 2017, Jeff Law wrote: > This is OK. > > What's the point of the delete_insns_since calls in patch #3? Deleting the first barrier when maybe_expand_insn failed. Other functions in the file use a similar approach. Thanks. Alexander

Re: [PING][PATCH 2/3] retire mem_signal_fence pattern

2017-09-05 Thread Alexander Monakov
On Mon, 4 Sep 2017, Uros Bizjak wrote: > introduced a couple of regressions on x86 (-m32, 32bit) testsuite: > > New failures: > FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild) > FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps Sorry. I suggest that the tests be XFAI

Re: [PATCH, middle-end]: Introduce memory_blockage named insn pattern

2017-09-05 Thread Alexander Monakov
On Tue, 5 Sep 2017, Uros Bizjak wrote: > This patch allows to emit memory_blockage pattern instead of default > asm volatile as a memory blockage. This patch is needed, so targets > (e.g. x86) can define and emit more optimal memory blockage pseudo > insn. Optimal in what sense? What pattern do y

Re: [PATCH, middle-end]: Introduce memory_blockage named insn pattern

2017-09-05 Thread Alexander Monakov
On Tue, 5 Sep 2017, Uros Bizjak wrote: > However, this definition can't be generic, since unspec is used. I see, if the only reason this needs a named pattern is lack of generic UNSPEC values, I believe it would be helpful to mention that in the documentation. A few comments on the patch: > @@ -

Re: [PATCH v2, middle-end]: Introduce memory_blockage named insn pattern

2017-09-05 Thread Alexander Monakov
On Tue, 5 Sep 2017, Uros Bizjak wrote: > Revised patch, incorporates fixes from Alexander's review comments. > > I removed some implementation details from Alexander's description of > memory_blockage named pattern. Well, to me it wasn't really obvious why a named pattern was needed in the first

Re: [PATCH] Output DIEs for outlined OpenMP functions in correct lexical scope

2017-05-05 Thread Alexander Monakov
On Thu, 4 May 2017, Kevin Buettner wrote: > diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c > index 5c48b78..7029951 100644 > --- a/gcc/omp-expand.c > +++ b/gcc/omp-expand.c > @@ -667,6 +667,25 @@ expand_parallel_call (struct omp_region *region, > basic_block bb, Outlined functions are also used

[PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-05-10 Thread Alexander Monakov
Hi, When expanding __atomic_thread_fence(x) to RTL, the i386 backend doesn't emit any instruction except for x==__ATOMIC_SEQ_CST (which emits 'mfence'). This is incorrect: although no machine barrier is needed, the compiler still must emit a compiler barrier into the IR to prevent propagation an

Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-05-10 Thread Alexander Monakov
While fixing the fences issue of PR80640 I've noticed that a similar oversight is present in expansion of atomic loads on x86: they become volatile loads, but that is insufficient, a compiler memory barrier is still needed. Volatility prevents tearing the load (preserves non-divisibility of atomic

Re: [PATCH] Kill -fdump-translation-unit

2017-05-10 Thread Alexander Monakov
On Wed, 10 May 2017, Richard Biener wrote: > On Tue, May 9, 2017 at 5:41 PM, Nathan Sidwell wrote: > > -fdump-translation-unit is an inscrutably opaque dump. It turned out that > > most of the uses of the tree-dump header file was to indirectly get at > > dumpfile.h, and the dump_function entry

Re: [PATCH] Kill -fdump-translation-unit

2017-05-10 Thread Alexander Monakov
On Wed, 10 May 2017, Jakub Jelinek wrote: > Can it at least be taken out of -fdump-tree-all? It is huge, often larger > than the sum of all the other dump files, and don't remember ever using it > for anything. Yes, apart from advertising the capability I don't imagine it's useful to produce that

Re: [PATCH] Add sequence check to leaf_function_p

2017-05-12 Thread Alexander Monakov
On Fri, 12 May 2017, Wilco Dijkstra wrote: > This is a followup from: > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02916.html > > Add an assert to leaf_function_p to ensure it is not called from a > prolog or epilog sequence (which would incorrectly return true in a > non-leaf function). As

Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-05-17 Thread Alexander Monakov
Ping. (to be clear, patch 2/2 is my previous followup in this thread, I forgot to adjust the subject line; it should have said: "[PATCH 2/2] x86: add compiler memory barriers when expanding atomic_load"). On Wed, 10 May 2017, Alexander Monakov wrote: > Hi, >

Re: [PATCH] Prevent extract_muldiv from introducing an overflow (PR sanitizer/80800)

2017-05-19 Thread Alexander Monakov
On Fri, 19 May 2017, Richard Biener wrote: > On Fri, 19 May 2017, Marek Polacek wrote: > > > On Fri, May 19, 2017 at 09:58:45AM +0200, Richard Biener wrote: > > > On Fri, 19 May 2017, Marek Polacek wrote: > > > > > > > extract_muldiv folds > > > > > > > > (n * 1 * z) * 50 > > > > > > >

Re: [PATCH] Prevent extract_muldiv from introducing an overflow (PR sanitizer/80800)

2017-05-19 Thread Alexander Monakov
On Fri, 19 May 2017, Marek Polacek wrote: > > I think it's possible to keep this folding, note that it's valid to > > transform to > > > > (n * 1 * z) * 50 > > > > (i.e. accumulate multiplications on the outermost factor) (to be precise, if the multiplication is done in a signed type an

Re: [PATCH] Prevent extract_muldiv from introducing an overflow (PR sanitizer/80800)

2017-05-19 Thread Alexander Monakov
On Fri, 19 May 2017, Joseph Myers wrote: > On Fri, 19 May 2017, Alexander Monakov wrote: > > (to be precise, if the multiplication is done in a signed type and the > > middle > > constant factor was a negated power of two, the sign change needs to remain: > &

Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)

2017-05-26 Thread Alexander Monakov
On Wed, 17 May 2017, Alexander Monakov wrote: > Ping. Ping^2? > (to be clear, patch 2/2 is my previous followup in this thread, I forgot to > adjust the subject line; it should have said: > "[PATCH 2/2] x86: add compiler memory barriers when expanding atomic_load"). &g

[PATCH doc] update documentation of x86 -mcx16 option

2017-05-26 Thread Alexander Monakov
Hi, This patch fixes a few issues in documentation of -mcx16 x86 backend option: - remove implementor-speak ('oword') - mention alignment restriction and availability only in 64-bit mode - improve usage example existing documentation uses a really silly example (128-bit integer counters),

Re: [PATCH] Add no_tail_call attribute

2017-05-29 Thread Alexander Monakov
Hi, On Mon, 29 May 2017, Yuri Gribov wrote: > Hi all, > > As discussed in > https://sourceware.org/ml/libc-alpha/2017-01/msg00455.html , some > libdl functions rely on return address to figure out the calling > DSO and then use this information in computation (e.g. output of dlsym > depends on w

Re: [PATCH] Dump function on internal errors

2017-05-29 Thread Alexander Monakov
Hi, On Wed, 24 May 2017, Richard Biener wrote: > current_pass might be NULL so you better do set_internal_error_hook when > we start executing passes (I detest global singletons to do such stuff > anyway). I think there are other problems in this patch, dump_function_to_file won't work after tra

Re: [PATCH] Dump function on internal errors

2017-05-29 Thread Alexander Monakov
On Mon, 29 May 2017, Jakub Jelinek wrote: > What if there is another ICE during the dumping? Won't we then > end in endless recursion? Perhaps global_dc->internal_error should > be cleared here first? Hm, no, as far as I can see existing diagnostic machinery is supposed to fully handle that. It

Re: [PATCH] Dump function on internal errors

2017-05-29 Thread Alexander Monakov
On Mon, 29 May 2017, Alexander Monakov wrote: > On Mon, 29 May 2017, Jakub Jelinek wrote: > > Also, as none of the arguments are used and we are in C++, > > perhaps it should be > > static void > > internal_error_function (diagnostic_context *, const char *, va_list

Re: [PATCH] Dump function on internal errors

2017-05-29 Thread Alexander Monakov
On Mon, 29 May 2017, Alexander Monakov wrote: > +/* This helper function is invoked from diagnostic routines prior to aborting > + due to internal compiler error. If a dump file is set up, dump the > + current function. */ > + > +void > +emergency_dump_function () > +

Re: [PATCH] Dump function on internal errors

2017-05-30 Thread Alexander Monakov
On Tue, 30 May 2017, Richard Biener wrote: > If you want to improve here I'd do > >if (current_pass) > fnotice (stderr, "during %s pass: %s\n", ... >if (dump_file && cfun) > { > fnotice (..); > execute_function_dump ... > } > > and I'd print the pass name ev

Re: [PATCH v2] Implement no_sanitize function attribute

2017-05-31 Thread Alexander Monakov
On Wed, 31 May 2017, Martin Liška wrote: > I added to common.opt: > Common RejectNegative Joined UInteger Var(flag_no_sanitize_fn) PerFunction > No sanitize flags for a function This needs a period at the end ("for a function."). > FAIL: compiler driver --help=optimizers option(s): "^ +-.*[^:.]$"

[committed] nvptx: adjust testcase for 'shared' attribute

2016-12-21 Thread Alexander Monakov
Hi, I have applied the following testsuite patch to fix one scan-assembler failure in the testcase for the 'shared' attribute caused by backend change to enable -fno-common by default. Alexander * gcc.target/nvptx/decl-shared.c (v_common): Add 'common' attribute to explicitly req

[PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-17 Thread Alexander Monakov
Hello, This patch series addresses a correctness issue in how OpenMP SIMD regions are transformed for SIMT execution. On NVPTX, OpenMP target code runs with per-warp stacks outside of SIMD regions, and needs to transition to per-lane stacks on SIMD region boundaries. Originally the plan was to i

[PATCH 1/5] omp-low: introduce omplow_simd_context

2017-01-17 Thread Alexander Monakov
In preparation to handle new SIMT privatization in lower_rec_simd_input_clauses this patch factors out variables common to this and lower_rec_input_clauses to a new structure. No functional change intended. * omp-low.c (omplow_simd_context): New struct. Use it... (lower_rec_simd_

[PATCH 2/5] ipa-inline: disallow inlining into SIMT regions

2017-01-17 Thread Alexander Monakov
This patch prevents inlining into SIMT code by introducing a new loop property 'in_simtreg' and using ANNOTATE_EXPR (_, 'simtreg') to carry this property between omp-low and the cfg pass (this is needed only for SIMD reduction helper loops; for main bodies of SIMD loops omp-expand sets loop->in_sim

[PATCH 5/5] omp-low: implement SIMT privatization

2017-01-17 Thread Alexander Monakov
This patch adjusts privatization in OpenMP SIMD loops lowered for SIMT targets. Addressable private variables become fields of new '.omp_simt' structure that is allocated by a call to GOMP_SIMT_ENTER (). This function is similar to __builtin_alloca_with_align, except that it obtains per-SIMT-lane

[PATCH 4/5] nvptx: implement SIMT enter/exit insns

2017-01-17 Thread Alexander Monakov
This patch adds handling of new omp_simt_enter/omp_simt_exit named insns in the NVPTX backend. * config/nvptx/nvptx-protos.h (nvptx_output_simt_enter): Declare. (nvptx_output_simt_exit): Declare. * config/nvptx/nvptx.c (nvptx_init_unisimt_predicate): Use cfun->machi

[PATCH 3/5] improve usage of PROP_gimple_lomp_dev

2017-01-17 Thread Alexander Monakov
This patch implements propagation of PROP_gimple_lomp_dev during inlining to allow using it to decide whether pass_omp_device_lower needs to run. We need to clear this property in expand_omp_simd when the _simt_ clause is present even if we are not doing any SIMT transforms, because we need to cle

<    1   2   3   4   5   6   7   8   9   10   >