This is a respin of the recently reviewed -msoft-stack patch that addresses
review feedback and adds required libgcc changes.
gcc/:
* config/nvptx/nvptx-protos.h (nvptx_output_set_softstack): Declare.
* config/nvptx/nvptx.c: (need_softstack_decl): New variable.
(init_softst
This patch implements emission of OpenMP target region entrypoints: the
compiler emits the target function with '$impl' appended to the name, and
under the original name it emits a short entry sequence that sets up shared
memory arrays and calls the target function via 'gomp_nvptx_main' (which is
i
This patch implements -muniform-simt code generation option, which is used to
emit code for OpenMP offloading. The goal is to emit code that can either
execute "normally", or can execute in a way that keeps all lanes in a given
warp active, their local state synchronized, and observable effects fr
On Thu, 9 Jun 2016, Nathan Sidwell wrote:
> > (define_expand "restore_stack_block"
> >[(match_operand 0 "register_operand" "")
> > (match_operand 1 "register_operand" "")]
>
> you've not addressed my previous comments about this.
To be clear -- do you mean that "restore_stack_block" shou
On Wed, 1 Jun 2016, Richard Biener wrote:
> > On Wed, 1 Jun 2016, Richard Biener wrote:
> > > 2016-06-01 Richard Biener
> > >
> > > * match.pd ((A & B) - (A & ~B) -> B - (A ^ B)): Add missing :c.
> > > (relational patterns): Use :c to avoid pattern duplications.
> >
> > Should the same tre
On Sun, 12 Jun 2016, Sandra Loosemore wrote:
> On 06/09/2016 10:53 AM, Alexander Monakov wrote:
> > +@item -muniform-simt
> > +@opindex muniform-simt
> > +Generate code that allows to keep all lanes in each warp active, even when
>
> Allows *what* to keep? E.g. wha
Hi,
> 2016-06-12 Uros Bizjak
>
> PR target/71242
> * config/ia64/ia64.c (enum ia64_builtins) [IA64_BUILTIN_NANQ]: New.
> [IA64_BUILTIN_NANSQ]: Ditto.
> (ia64_fold_builtin): New function.
> (TARGET_FOLD_BUILTIN): New define.
> (ia64_init_builtins) Declare const_string_ty
On Thu, 9 Jun 2016, Alexander Monakov wrote:
> Hi,
>
> This patch teaches cgraphunit.c:output_in_order to output undefined external
> variables via assemble_undefined_decl. At the moment that is only done for
> -ftoplevel-reorder in varpool.c:symbol_table::output_variables. Thi
On Thu, 16 Jun 2016, Jan Hubicka wrote:
> > On Thu, 9 Jun 2016, Alexander Monakov wrote:
> + FOR_EACH_VARIABLE (pv)
[snip]
> + i = pv->order;
> + gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
> + nodes[i].kind = pv->definition ? ORDER_VAR : ORDER_VAR_UNDE
On Thu, 16 Jun 2016, Jan Hubicka wrote:
> I see, order is created at a time variable is added to symbol table (not at
> time when definition is given). So we should have order everywhere.
> Patch is OK
Thanks! If you don't mind a quick followup question: now that both
FOR_EACH_VARIABLE loops in
Ping.
On Mon, 13 Jun 2016, Alexander Monakov wrote:
> On Sun, 12 Jun 2016, Sandra Loosemore wrote:
> > On 06/09/2016 10:53 AM, Alexander Monakov wrote:
> > > +@item -muniform-simt
> > > +@opindex muniform-simt
> > > +Generate code that allows to keep
Hi,
I've discovered that this assert in my patch was too restrictive:
+ if (DECL_HAS_VALUE_EXPR_P (pv->decl))
+ {
+ gcc_checking_assert (lookup_attribute ("omp declare target link",
+DECL_ATTRIBUTES (pv->decl)));
Testing for the
On Sat, 25 Jun 2016, H.J. Lu wrote:
> The resolver for ifunc functions might resolve to a non-local function.
I think the explanation doesn't match the testcase, in which all three
functions: the resolver, the symbol being resolved, and the ultimate
resolution are all static. I don't think there w
On Sun, 26 Jun 2016, H.J. Lu wrote:
> On Sun, Jun 26, 2016 at 12:49 AM, Alexander Monakov
> wrote:
> > On Sat, 25 Jun 2016, H.J. Lu wrote:
> >> The resolver for ifunc functions might resolve to a non-local function.
> >
> > I think the explanation doesn't
Ping.
On Thu, 5 Oct 2017, Alexander Monakov wrote:
> In ira-color.c, qsort comparator allocno_priority_compare_func lacks anti-
> commutativity and can indicate A < B < A if boths allocnos satisfy
> non_spilled_static_chain_regno_p. It should fall down to following
> sub-compar
On Tue, 24 Oct 2017, Jakub Jelinek wrote:
> loop transfering the addresses or firstprivate_int values to the device
> - where we issued mapnum host2dev transfers each just pointer-sized
> when we could have just prepared all the pointers in an array and host2dev
> copy them all together.
Can you p
On Tue, 24 Oct 2017, Jakub Jelinek wrote:
> > Why did you chose the 32KB and 4KB limits? I wonder if that would have
> > any impact on firstprivate_int values. If this proves to be effective,
> > it seems like we should be able to eliminate GOMP_MAP_FIRSTPRIVATE_INT
> > altogether.
>
> The thing i
On Thu, 8 Jun 2017, Alexander Monakov wrote:
> Ping^3.
Ping^4: https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00782.html
This is a wrong-code issue with C11 atomics: even if no machine barrier is
needed for a given fence type on this architecture, a compiler barrier must
be present in
Hi,
This is a followup to https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01545.html
Recently due to a fix for PR 80800 GCC has lost the ability to reassociate
signed multiplications chains to go from 'X * CST1 * Y * CST2'
to 'X * Y * (CST1 * CST2)'. The fix to that PR prevents extract_muldiv from
On Thu, 13 Jul 2017, Marc Glisse wrote:
> I notice that we do not turn (X*10)*10 into X*100 in GIMPLE.
Sorry, could you clarify what you mean here? I think we certainly do that,
just not via match.pd, but in 'associate:' case of fold_binary_loc.
> Relying on inner expressions being folded can be
On Thu, 13 Jul 2017, Marc Glisse wrote:
> I notice that we do not turn (X*10)*10 into X*100 in GIMPLE [...]
I've completely missed that. Posting another patch to address that.
> Relying on inner expressions being folded can be slightly dangerous,
> especially for generic IIRC. It seems easy enou
On Thu, 13 Jul 2017, Marc Glisse wrote:
> X*big*big where abs(big*big)>abs(INT_MIN) can be optimized to 0
I'm not sure that would be a win, eliminating X prevents the compiler from
deducing that X must be zero (if overflow invokes undefined behavior).
> the only hard case is when the product of t
This comparator lacks anti-commutativity and can indicate
A < B < A if both A and B satisfy non_spilled_static_chain_regno_p.
Proceed to following tie-breakers in that case.
(it looks like the code incorrectly assumes that at most one register
in the array will satisfy non_spilled_static_chain_reg
The autopref_rank_for_schedule sub-comparator and its subroutine
autopref_rank_data lack transitivity. Skip checking if they are in use.
This heuristic is disabled by default everywhere except ARM and AArch64,
so on other targets this does not suppress checking all the time.
* haifa-sch
This qsort comparator lacks anti-commutativity and can indicate
A < B < A if A and B have the same bitpos. Return 0 in that case.
* gimple-ssa-store-merging.c (sort_by_bitpos): Return 0 on equal bitpos.
---
gcc/gimple-ssa-store-merging.c | 6 +++---
1 file changed, 3 insertions(+), 3 del
Subtracting values to produce a -/0/+ comparison value only works when
original values have limited range. Otherwise it leads to broken
comparator that indicates 0 < 0x4000 < 0x8000 < 0.
Yuri posted an equivalent patch just a few hours ago:
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00
Hello,
(previous thread here:
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00944.html )
we still have a few places in GCC where the comparator function passed to qsort
is not actually a proper sorting predicate. Most commonly it fails to impose
total ordering by lacking transitivity. It's usef
This is the updated qsort comparator verifier.
Since we have vec::qsort(cmp), the patch uses the macro argument counting
trick to redirect only the four-argument invocations of qsort to qsort_chk.
I realize that won't win much sympathies, but a patch doing mass-renaming
of qsort in the whole GCC c
The reload_pseudo_compare_func comparator, when used from assign_by_spills,
can be non-transitive, indicating A < B < C < A if both A and C satisfy
!bitmap_bit_p (&non_reload_pseudos, rAC), but B does not.
This function was originally a proper comparator, and the problematic
clause was added to fi
On Mon, 17 Jul 2017, Marc Glisse wrote:
> > +/* Combine successive multiplications. Similar to above, but handling
> > + overflow is different. */
> > +(simplify
> > + (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2)
> > + (with {
> > + bool overflow_p;
> > + wide_int mul = wi::mul (@1, @2, TYP
On Mon, 17 Jul 2017, Alexander Monakov wrote:
> On Mon, 17 Jul 2017, Marc Glisse wrote:
> > > +/* Combine successive multiplications. Similar to above, but handling
> > > + overflow is different. */
> > > +(simplify
> > > + (mult (mult @0 INTE
On Wed, 19 Jul 2017, Richard Biener wrote:
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -283,6 +283,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >> || mul != wi::min_value (TYPE_PRECISION (type), SIGNED))
> >> { build_zero_cst (type); })
> >>
> >> +/* Combine successiv
On Wed, 19 Jul 2017, Jeff Law wrote:
> > Glibc people were worried that attribute would be lost when taking a
> > pointer to function
> > (https://sourceware.org/ml/libc-alpha/2017-01/msg00482.html). I think
> > their reasoning was that return address is a shadow argument for
> > dlsym-like functio
On Wed, 19 Jul 2017, Jakub Jelinek wrote:
> > 1) recognize dlsym by name and suppress tailcalls to it
> >
> >this would solve >99% cases because calling dlsym by pointer would be
> > rare,
> >and has the benefit of not requiring libc header changes;
>
> Recognizing by name is IMNSHO unde
On Wed, 19 Jul 2017, Yuri Gribov wrote:
> So to reiterate, your logic here is that someone would wipe dlsym type
> (e.g. by casting to void *), then later cast to another type which
> lacks tailcall attribute. So proposed solution won't protect against
> situation like this.
No, it's not "my logic
On Wed, 19 Jul 2017, Alexander Monakov wrote:
> > The one and only advantage of attribute compared to Jakubs approach
> > (or yours, they share the same idea of wrapping dlsym calls) is that
> > it forces user to carry it around when taking address of function.
>
> It
On Thu, 20 Jul 2017, Richard Biener wrote:
> >> So for saturating types isn't the issue when @1 and @2 have opposite
> >> sign and the inner multiply would have saturated?
> >
> > No, I think the only special case is @1 == @2 == -1, otherwise either @2 is
> > 0 or 1, or @1 * @2 is larger in magnitu
Hi,
Segher pointed out on IRC that ICE reporting with dumps enabled got worse:
if emergency_dump_function itself leads to an ICE (e.g. by segfaulting),
nested ICE reporting will invoke emergency_dump_function in exactly the
same context, but not only would we uselessly announce current pass again,
Hello,
The final tie-breaker in pair_cmp comparator looks strange, it correctly
yields zero for equal expr->symtree-n.sym values, but for unequal values
it produces 0 or 1. This would be correct for C++ STL-style comparators
that require "less-than" predicate to be computed, but not for C qsort.
Previous revision here: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00889.html
Reassociate (X * CST) * Y to (X * Y) * CST, this pushes constants in
multiplication chains to outermost factors, where they can be combined.
Changed in this revision:
- remove !TYPE_OVERFLOW_SANITIZED and !TYPE_SATUR
Previous revision here: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01090.html
Reassociate X * CST1 * CST2 to X * (CST1 * CST2).
Changed in this revision:
- remove the check for @2 being 0 or -1
* match.pd ((X * CST1) * CST2): Simplify to X * (CST1 * CST2).
testsuite:
* gcc.dg/
Profiling uses of qsort in GCC reveals that a significant portion of calls
comes from domwalk.c where child nodes in dominator tree are reordered
according to postorder numbering. However we know that in dominator trees
the vast majority of nodes have 0, 2 or 3 children, and handling those
cases s
On Sat, 22 Jul 2017, Segher Boessenkool wrote:
> On Sat, Jul 15, 2017 at 11:47:45PM +0300, Alexander Monakov wrote:
> > --- a/gcc/gimple-ssa-store-merging.c
> > +++ b/gcc/gimple-ssa-store-merging.c
> > @@ -516,12 +516,12 @@ sort_by_bitpos (const void *x, const void *y)
> &
On Mon, 24 Jul 2017, Jeff Law wrote:
> As Uli noted, we should be using std::swap.
>
> Can you please repost ?
* domwalk.c (cmp_bb_postorder): Simplify.
(sort_bbs_postorder): New function. Use it...
(dom_walker::walk): ...here to optimize common cases.
---
gcc/domwalk.c
On Tue, 25 Jul 2017, Alexander Monakov wrote:
> --- a/gcc/domwalk.c
> +++ b/gcc/domwalk.c
> @@ -128,19 +128,46 @@ along with GCC; see the file COPYING3. If not see
> which is currently an abstraction over walking tree statements. Thus
> the dominator walker is currently
On Tue, 25 Jul 2017, Kyrill Tkachov wrote:
> For the uses of this function the order when the bitpos is the same
> does not matter, I just wanted to avoid returning zero to avoid perturbations
> due to qsort.
But you can't stabilize qsort in that manner, in fact by making the comparator
invalid yo
On Wed, 26 Jul 2017, Jeff Law wrote:
> So I think this is up to the target maintainers. I have no concerns
> with enabling use of expand_asm_memory_barrier to be used outside of
> optabs. So if the s390/x86 maintainers want to go forward, the optabs
> changes are pre-approved.
Please see the alt
On Sat, 22 Jul 2017, Segher Boessenkool wrote:
> On Thu, Jul 20, 2017 at 05:40:28PM +0300, Alexander Monakov wrote:
> > Segher pointed out on IRC that ICE reporting with dumps enabled got worse:
> > if emergency_dump_function itself leads to an ICE (e.g. by segfaulting),
> >
On Wed, 26 Jul 2017, Jeff Law wrote:
> I'm not sure what you mean by extraneous compiler barriers -- isn't the
> worst case scenario here that the target emits them as well? So there
> would be an extraneous one in that case, but that ought to be a "don't
> care".
Yes, exactly this.
> In the mid
On Wed, 26 Jul 2017, Alexander Monakov wrote:
> On Wed, 26 Jul 2017, Jeff Law wrote:
> > I'm not sure what you mean by extraneous compiler barriers -- isn't the
> > worst case scenario here that the target emits them as well? So there
> > would be an extraneous o
On Mon, 31 Jul 2017, Jeff Law wrote:
> >> In the middle end patch, do we need a barrier before the fence as well?
> >> The post-fence barrier prevents reordering the fence with anything which
> >> follows the fence. But do we have to also prevent reordering the fence
> >> with prior instructions w
On Mon, 31 Jul 2017, Jeff Law wrote:
> > Please consider that expand_mem_thread_fence is used to place fences around
> > seq-cst atomic loads&stores when the backend doesn't provide a direct
> > pattern.
> > With compiler barriers on both sides of the machine barrier, the generated
> > sequence fo
On Mon, 31 Jul 2017, Jeff Law wrote:
> I must have missed something. Can't you just define
>
> qsort (BASE, NMEMB, SIZE, COMPARE) into
>
> qsort_chk (BASE, NMEMB, SIZE, COMPARE)
>
> That shouldn't affect the qsort from vec? Right? Or am I missing something
If you do
#define qsort(base, n
Hello,
On Thu, 20 Jul 2017, Alexander Monakov wrote:
> Segher pointed out on IRC that ICE reporting with dumps enabled got worse:
> if emergency_dump_function itself leads to an ICE (e.g. by segfaulting),
> nested ICE reporting will invoke emergency_dump_function in exactly the
>
As recently discussed in the previous thread for PR 80640, some targets have
sufficiently strong hardware memory ordering that implementation of C11 atomic
fences might not need machine barriers. However, a compiler memory barrier
nevertheless must be present, and at least two targets (x86, s390)
Again, like in patch 1/3, a backend might expand C11 atomic load/store to a
volatile memory access if hardware memory ordering is sufficiently strong that
no machine barrier is required. Nevertheless, we must ensure that compiler
memory barrier(s) are present before/after the access to prevent wro
Similar to mem_thread_fence issue from the patch 1/3, RTL representation of
__atomic_signal_fence must be a compiler barrier. We have just one backend
offering this pattern, and it does not place a compiler barrier.
It does not appear useful to expand signal_fence to some kind of hardware
instruc
On Wed, 2 Aug 2017, Jeff Law wrote:
> Well, there's not *that* many qsort calls. My quick grep shows 94 and
> its a very mechanical change. Then a poison in system.h to ensure raw
> calls to qsort don't return.
Any suggestion for the non-poisoned replacement? xqsort? gcc_qsort?
Can you review
On Wed, 2 Aug 2017, Jeff Law wrote:
> >> Well, there's not *that* many qsort calls. My quick grep shows 94 and
> >> its a very mechanical change. Then a poison in system.h to ensure raw
> >> calls to qsort don't return.
Note that poisoning qsort outlaws vec::qsort too; it would need to be mass-
On Thu, 3 Aug 2017, Jakub Jelinek wrote:
> Do we really need to rename and poison anything? qsort () in the source is
> something that is most obvious to developers, so trying to force them to use
> something different will just mean extra thing to learn.
Yep, I'd prefer to have a solution that k
On Fri, 4 Aug 2017, Oleg Endo wrote:
> > Note that with vec::qsort -> vec::sort renaming (which should be less
> > controversial, STL also has std::vector::sort)
>
> No it doesn't? One uses std::sort from on a pair of random
> access iterators to sort a std::vector.
My mistake, but the main poi
On Fri, 4 Aug 2017, Wilco Dijkstra wrote:
> This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C).
I don't think you can do that for non-positive C.
> Do this only for fast-math as accuracy is reduced. This is much faster
> since pow is more complex than exp - with a current GLI
On Sat, 5 Aug 2017, Richard Sandiford wrote:
> It would be simpler to test whether targetm.gen_mem_thread_fence
> returns NULL.
>
> This feels a bit hacky though. Checking whether a generator produced no
> instructions is usually the test for whether the generator FAILed, which
> should normally
On Mon, 7 Aug 2017, Michael Matz wrote:
> > I am looking for a run-time test which breaks unwinder.
>
> I don't have one handy. Idea: make two threads, one endlessly looping in
> the "frame-less" function, the other causing a signal to the first thread,
> and the signal handler checking that un
On Wed, 9 Aug 2017, Jeff Law wrote:
> >> The _5th macro isn't that bad either, appart from using reserved namespace
> >> identifiers (it really should be something like qsort_5th and the arguments
> >> shouldn't start with underscores).
> >
> > I didn't understand what Jeff found "ugly" about it;
On Thu, 17 Aug 2017, Wilco Dijkstra wrote:
> This patch simplifies pow (C, x) into exp (x * C1) if C > 0, C1 = log (C).
Note this changes the outcome for C == +Inf, x == 0 (pow is specified to
return 1.0 in that case, but x * C1 == NaN). There's another existing
transform with the same issue, 'p
On Thu, 17 Aug 2017, Martin Sebor wrote:
> returns a pointer to the selected implementation function. The
> implementation functions' declarations must match the API of the
> -function being implemented, the resolver's declaration is be a
> -function returning pointer to void function returning
Ping (for this and patch 3/3 in the thread).
On Wed, 2 Aug 2017, Alexander Monakov wrote:
> Similar to mem_thread_fence issue from the patch 1/3, RTL representation of
> __atomic_signal_fence must be a compiler barrier. We have just one backend
> offering this pattern, and it does no
Hello,
The code in record_address_regs shown in the following patch assumes that
if a given target cannot have two registers in a memory address, then the
sole register, if present, must be the leftmost operand in the PLUS chain.
I think this is not true if the target uses unspecs to signify spec
On Thu, 31 Aug 2017, Jeff Law wrote:
> This is OK.
>
> What's the point of the delete_insns_since calls in patch #3?
Deleting the first barrier when maybe_expand_insn failed.
Other functions in the file use a similar approach.
Thanks.
Alexander
On Mon, 4 Sep 2017, Uros Bizjak wrote:
> introduced a couple of regressions on x86 (-m32, 32bit) testsuite:
>
> New failures:
> FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild)
> FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps
Sorry. I suggest that the tests be XFAI
On Tue, 5 Sep 2017, Uros Bizjak wrote:
> This patch allows to emit memory_blockage pattern instead of default
> asm volatile as a memory blockage. This patch is needed, so targets
> (e.g. x86) can define and emit more optimal memory blockage pseudo
> insn.
Optimal in what sense? What pattern do y
On Tue, 5 Sep 2017, Uros Bizjak wrote:
> However, this definition can't be generic, since unspec is used.
I see, if the only reason this needs a named pattern is lack of generic UNSPEC
values, I believe it would be helpful to mention that in the documentation.
A few comments on the patch:
> @@ -
On Tue, 5 Sep 2017, Uros Bizjak wrote:
> Revised patch, incorporates fixes from Alexander's review comments.
>
> I removed some implementation details from Alexander's description of
> memory_blockage named pattern.
Well, to me it wasn't really obvious why a named pattern was needed
in the first
On Thu, 4 May 2017, Kevin Buettner wrote:
> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
> index 5c48b78..7029951 100644
> --- a/gcc/omp-expand.c
> +++ b/gcc/omp-expand.c
> @@ -667,6 +667,25 @@ expand_parallel_call (struct omp_region *region,
> basic_block bb,
Outlined functions are also used
Hi,
When expanding __atomic_thread_fence(x) to RTL, the i386 backend doesn't emit
any instruction except for x==__ATOMIC_SEQ_CST (which emits 'mfence'). This
is incorrect: although no machine barrier is needed, the compiler still must
emit a compiler barrier into the IR to prevent propagation an
While fixing the fences issue of PR80640 I've noticed that a similar oversight
is present in expansion of atomic loads on x86: they become volatile loads, but
that is insufficient, a compiler memory barrier is still needed. Volatility
prevents tearing the load (preserves non-divisibility of atomic
On Wed, 10 May 2017, Richard Biener wrote:
> On Tue, May 9, 2017 at 5:41 PM, Nathan Sidwell wrote:
> > -fdump-translation-unit is an inscrutably opaque dump. It turned out that
> > most of the uses of the tree-dump header file was to indirectly get at
> > dumpfile.h, and the dump_function entry
On Wed, 10 May 2017, Jakub Jelinek wrote:
> Can it at least be taken out of -fdump-tree-all? It is huge, often larger
> than the sum of all the other dump files, and don't remember ever using it
> for anything.
Yes, apart from advertising the capability I don't imagine it's useful to
produce that
On Fri, 12 May 2017, Wilco Dijkstra wrote:
> This is a followup from:
> https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02916.html
>
> Add an assert to leaf_function_p to ensure it is not called from a
> prolog or epilog sequence (which would incorrectly return true in a
> non-leaf function).
As
Ping.
(to be clear, patch 2/2 is my previous followup in this thread, I forgot to
adjust the subject line; it should have said:
"[PATCH 2/2] x86: add compiler memory barriers when expanding atomic_load").
On Wed, 10 May 2017, Alexander Monakov wrote:
> Hi,
>
On Fri, 19 May 2017, Richard Biener wrote:
> On Fri, 19 May 2017, Marek Polacek wrote:
>
> > On Fri, May 19, 2017 at 09:58:45AM +0200, Richard Biener wrote:
> > > On Fri, 19 May 2017, Marek Polacek wrote:
> > >
> > > > extract_muldiv folds
> > > >
> > > > (n * 1 * z) * 50
> > > >
> > >
On Fri, 19 May 2017, Marek Polacek wrote:
> > I think it's possible to keep this folding, note that it's valid to
> > transform to
> >
> > (n * 1 * z) * 50
> >
> > (i.e. accumulate multiplications on the outermost factor)
(to be precise, if the multiplication is done in a signed type an
On Fri, 19 May 2017, Joseph Myers wrote:
> On Fri, 19 May 2017, Alexander Monakov wrote:
> > (to be precise, if the multiplication is done in a signed type and the
> > middle
> > constant factor was a negated power of two, the sign change needs to remain:
> &
On Wed, 17 May 2017, Alexander Monakov wrote:
> Ping.
Ping^2?
> (to be clear, patch 2/2 is my previous followup in this thread, I forgot to
> adjust the subject line; it should have said:
> "[PATCH 2/2] x86: add compiler memory barriers when expanding atomic_load").
&g
Hi,
This patch fixes a few issues in documentation of -mcx16 x86 backend option:
- remove implementor-speak ('oword')
- mention alignment restriction and availability only in 64-bit mode
- improve usage example
existing documentation uses a really silly example (128-bit integer
counters),
Hi,
On Mon, 29 May 2017, Yuri Gribov wrote:
> Hi all,
>
> As discussed in
> https://sourceware.org/ml/libc-alpha/2017-01/msg00455.html , some
> libdl functions rely on return address to figure out the calling
> DSO and then use this information in computation (e.g. output of dlsym
> depends on w
Hi,
On Wed, 24 May 2017, Richard Biener wrote:
> current_pass might be NULL so you better do set_internal_error_hook when
> we start executing passes (I detest global singletons to do such stuff
> anyway).
I think there are other problems in this patch, dump_function_to_file won't work
after tra
On Mon, 29 May 2017, Jakub Jelinek wrote:
> What if there is another ICE during the dumping? Won't we then
> end in endless recursion? Perhaps global_dc->internal_error should
> be cleared here first?
Hm, no, as far as I can see existing diagnostic machinery is supposed to fully
handle that. It
On Mon, 29 May 2017, Alexander Monakov wrote:
> On Mon, 29 May 2017, Jakub Jelinek wrote:
> > Also, as none of the arguments are used and we are in C++,
> > perhaps it should be
> > static void
> > internal_error_function (diagnostic_context *, const char *, va_list
On Mon, 29 May 2017, Alexander Monakov wrote:
> +/* This helper function is invoked from diagnostic routines prior to aborting
> + due to internal compiler error. If a dump file is set up, dump the
> + current function. */
> +
> +void
> +emergency_dump_function ()
> +
On Tue, 30 May 2017, Richard Biener wrote:
> If you want to improve here I'd do
>
>if (current_pass)
> fnotice (stderr, "during %s pass: %s\n", ...
>if (dump_file && cfun)
> {
> fnotice (..);
> execute_function_dump ...
> }
>
> and I'd print the pass name ev
On Wed, 31 May 2017, Martin Liška wrote:
> I added to common.opt:
> Common RejectNegative Joined UInteger Var(flag_no_sanitize_fn) PerFunction
> No sanitize flags for a function
This needs a period at the end ("for a function.").
> FAIL: compiler driver --help=optimizers option(s): "^ +-.*[^:.]$"
Hi,
I have applied the following testsuite patch to fix one scan-assembler failure
in the testcase for the 'shared' attribute caused by backend change to enable
-fno-common by default.
Alexander
* gcc.target/nvptx/decl-shared.c (v_common): Add 'common' attribute to
explicitly req
Hello,
This patch series addresses a correctness issue in how OpenMP SIMD regions are
transformed for SIMT execution. On NVPTX, OpenMP target code runs with
per-warp stacks outside of SIMD regions, and needs to transition to per-lane
stacks on SIMD region boundaries. Originally the plan was to i
In preparation to handle new SIMT privatization in lower_rec_simd_input_clauses
this patch factors out variables common to this and lower_rec_input_clauses to
a new structure. No functional change intended.
* omp-low.c (omplow_simd_context): New struct. Use it...
(lower_rec_simd_
This patch prevents inlining into SIMT code by introducing a new loop
property 'in_simtreg' and using ANNOTATE_EXPR (_, 'simtreg') to carry this
property between omp-low and the cfg pass (this is needed only for SIMD
reduction helper loops; for main bodies of SIMD loops omp-expand sets
loop->in_sim
This patch adjusts privatization in OpenMP SIMD loops lowered for SIMT targets.
Addressable private variables become fields of new '.omp_simt' structure that
is allocated by a call to GOMP_SIMT_ENTER (). This function is similar to
__builtin_alloca_with_align, except that it obtains per-SIMT-lane
This patch adds handling of new omp_simt_enter/omp_simt_exit named insns
in the NVPTX backend.
* config/nvptx/nvptx-protos.h (nvptx_output_simt_enter): Declare.
(nvptx_output_simt_exit): Declare.
* config/nvptx/nvptx.c (nvptx_init_unisimt_predicate): Use
cfun->machi
This patch implements propagation of PROP_gimple_lomp_dev during inlining to
allow using it to decide whether pass_omp_device_lower needs to run.
We need to clear this property in expand_omp_simd when the _simt_ clause is
present even if we are not doing any SIMT transforms, because we need to
cle
501 - 600 of 1024 matches
Mail list logo