https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116458
--- Comment #9 from Alexander Monakov ---
Okay, if you take the addition and the branch from the inlined variant:
addl %eax, %edx
je .L3
and add a 'test' instruction:
addl %eax, %edx
test %edx, %edx
je .L3
then Valgrind doesn't complain. So
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116458
--- Comment #12 from Alexander Monakov ---
Thanks. It's probably nicer to deduplicate computation of required padding to a
common header (libcpp/internal.h), I'll send a patch to that effect.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116483
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116458
Alexander Monakov changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116483
--- Comment #9 from Alexander Monakov ---
(In reply to Xi Ruoyao from comment #8)
> Is there any pointer how to implement this instead?
It may be sufficient to change
(define_insn "@tablejump"
[(set (pc)
(match_operand:P 0 "register_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116483
--- Comment #11 from Alexander Monakov ---
> It only handles switch statements, not computed gotos.
Oh, right, apologies for misunderstanding your question like that. For computed
gotos it is indeed not so easy, especially if there is more than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116738
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109187
--- Comment #2 from Alexander Monakov ---
This is caused by overflowing subtraction in autopref_rank_for_schedule:
if (!irrel1 && !irrel2)
/* Sort memory references from lowest offset to the largest. */
r = data1->offset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109187
--- Comment #3 from Alexander Monakov ---
The reduced case is offsetting stack variables in a manner that seems too
invalid for my taste, so I plan to send a patch with a following testcase
instead (needs -O2 --param sched-autopref-queue-depth=1
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
Target: aarch64-*-*
With -O2 -mstrict-align, for
void f(void *p)
{
__builtin_memset(p, 0, 16);
}
gcc-10 and before
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109187
Alexander Monakov changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109368
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369
--- Comment #5 from Alexander Monakov ---
Indeed, sorry, __attribute__((used)) seems a much better solution for symbols
that might be referenced implicitly, in a manner that LTO plugin cannot see.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369
--- Comment #7 from Alexander Monakov ---
Yes, ld should claim _pei386_runtime_relocator (even if later it becomes
unneeded due to zero relocations left to fix up) to make this work properly.
That's for Binutils to fix on their side.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109469
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
||amonakov at gcc dot gnu.org
Status|UNCONFIRMED |RESOLVED
--- Comment #4 from Alexander Monakov ---
This is also SLP emitting a vector ctor in an unexpected place, just like in
PR109469, so I'll go ahead and mark it as a dup. Thanks fo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109469
--- Comment #8 from Alexander Monakov ---
*** Bug 109477 has been marked as a duplicate of this bug. ***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369
--- Comment #9 from Alexander Monakov ---
(In reply to Pali Rohár from comment #8)
> So from the discussion, do I understand correctly that this is rather LD
> linker issue?
Yes, ld changes will be needed to make this work automatically, withou
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109585
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109585
--- Comment #19 from Alexander Monakov ---
Manually minimized testcase for investigation, miscompiled at -O2:
struct P {
long v;
struct P *n;
};
struct F {
long x;
struct P fam[];
};
int f(struct F *f, int i)
{
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109587
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109634
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90746
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109690
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780
--- Comment #10 from Alexander Monakov ---
(In reply to Martin Liška from comment #9)
> Started with zen tuning revision r13-4839-geef81eefcdc2a5.
The issue is also reproducible with -march=haswell or -march=skylake, so you
can use those for fu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780
--- Comment #12 from Alexander Monakov ---
Eh, that commit sneakily changed avx2 tuning without explaining that in the
Changelog. Anyway, it should possible to "workaround" that by compiling with
-O2 -mavx2 -mtune=skylake-avx512
instead, in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780
Alexander Monakov changed:
What|Removed |Added
Summary|csmith: runtime crash with |[12/13/14 Regression]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902
--- Comment #20 from Alexander Monakov ---
I missed it the first time around, but placing PAREN_EXPR around the complete
expression won't work: nothing will prevent GCC from duplicating evaluations of
the sub-expressions, and then randomly formi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
--- Comment #7 from Alexander Monakov ---
This problem seems to go way back. I'm told even gcc-9 broke LLVM like that.
For my investigation, I took latest gcc-11 snapshot and llvm-13.0.1.
My conclusion that it is a lifetime-dse violation in LL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
--- Comment #8 from Alexander Monakov ---
Ah, forgot to mention that compiler the offending User.cpp without -flto also
avoids the problem.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
--- Comment #10 from Alexander Monakov ---
Indeed, that makes things easier, thanks.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
--- Comment #12 from Alexander Monakov ---
That would not fix the problem, lifetime-dse affects code that creates 'class
User' objects, not the implementation of its 'operator new' override.
(also the linked bug says "MDNode has the same patter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
--- Comment #14 from Alexander Monakov ---
(In reply to Jan Hubicka from comment #13)
> Indeed it is quite long time problem with clang not building with lifetime
> DSE and strict aliasing. I wonder why this is not fixed on clang side?
Because
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
--- Comment #17 from Alexander Monakov ---
Right, thanks, I think SUSE build log confirms that (careful, large file):
https://build.opensuse.org/public/build/openSUSE:Factory/standard/x86_64/llvm16/_log
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
--- Comment #21 from Alexander Monakov ---
(In reply to Xi Ruoyao from comment #18)
> Maybe. Should we send a patch?
Yes, if we have a volunteer.
> If I read the LLVM code correctly, -fno-strict-aliasing is enabled for
> Clang, but not other
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
--- Comment #22 from Alexander Monakov ---
(In reply to Jan Hubicka from comment #19)
> It would be really nice to have the ranger bug fixed. Since lifetime
> DSE is all handled in C++ FE there is no good reason why it should not
> work to LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
--- Comment #24 from Alexander Monakov ---
Appreciate the advice. So far I've managed to reduce the number of LTO inputs
down to two files, RegisterBankInfo.cpp.o plus APInt.cpp.o. I also built
gcc-12.3 with lineinfo and have a better backtrace:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
--- Comment #26 from Alexander Monakov ---
Would that help? GCC raises its own stack limit to 64MB:
gcc.cc: stack_limit_increase (64 * 1024 * 1024);
toplev.cc: stack_limit_increase (64 * 1024 * 1024);
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
--- Comment #31 from Alexander Monakov ---
(In reply to Xi Ruoyao from comment #28)
> "To put it simply, operator delete for class User inspects memory of the
> object after the end of its lifetime. This shows as a use-after-dtor error
> when ru
-valid-code
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
Created attachment 55076
--> https://gcc.gnu.org/bugzilla/attachment.cgi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
--- Comment #32 from Alexander Monakov ---
Ranger ICE is PR 109841 (reduced so it doesn't need LTO).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109806
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480
--- Comment #20 from Alexander Monakov ---
(note that if you uninclude the testcase and compile with -fno-exceptions it's
much faster)
On the smaller testcase from comment 14, prune_unused_phi_nodes invokes
gcc_qsort 53386 times. There are two
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480
--- Comment #21 from Alexander Monakov ---
It is possible to reduce gcc_qsort workload by improving the presorted-ness of
the array, but of course avoiding quadratic behavior would be much better.
With the following change, we go from
261,2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114765
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114923
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114923
--- Comment #4 from Alexander Monakov ---
You can place points of possible access outside of abstract machine in a
fine-grained manner with volatile asms:
asm volatile("" : "=m"(buf));
This cannot be reordered against accesses to volatile va
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
Target: x86_64-*-*
#include
__m128i f(__m128i dummy, __m128i x)
{
__v16qi v = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114944
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114944
--- Comment #4 from Alexander Monakov ---
Like this:
pandxmm1, XMMWORD PTR .LC0[rip]
movaps XMMWORD PTR [rsp-40], xmm0
xor eax, eax
xor edx, edx
movaps XMMWORD PTR [rsp-24], xmm1
mov
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115014
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115091
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115132
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161
--- Comment #18 from Alexander Monakov ---
No, allowing value-changing transformations under -ftrapping-math is really not
appropriate. Invoking the intrinsic on a large floating-point value is not UB.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161
--- Comment #20 from Alexander Monakov ---
(In reply to Jakub Jelinek from comment #19)
> If we guarantee that we never constant fold FIX/UNSIGNED_FIX with
> -ftrapping-math (we shouldn't, as the exceptions should be raised), then
> using FIX/UN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115170
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161
--- Comment #23 from Alexander Monakov ---
(In reply to Sergei Trofimovich from comment #22)
> Here `pcmpeqd %xmm2,%xmm1` is a problematic instruction. Why does `gcc` use
> `%xmm2` (result of `cvttps2dq`) instead of, say `%xmm0` which contains
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115333
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768
--- Comment #9 from Alexander Monakov ---
(In reply to Arsen Arsenović from comment #8)
> indeed (but I believe it did happen with Alder Lake already, by accident,
> with AVX512 on P-cores but not on E-cores).
AFAIK on those Alder Lake CPUs you
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768
--- Comment #11 from Alexander Monakov ---
(In reply to Hongtao.liu from comment #10)
> > indeed (but I believe it did happen with Alder Lake already, by accident,
> > with AVX512 on P-cores but not on E-cores).
>
> AVX512 is physically fused o
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
int f(int i)
{
int f = 1;
return i[(unsigned char *)&f];
}
int g(int i)
{
int f = 1;
retu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66487
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112367
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82242
--- Comment #5 from Alexander Monakov ---
The small testcase from comment 3 is now improved on trunk, possibly thanks to
work in PR 110215.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307
Alexander Monakov changed:
What|Removed |Added
CC||uros at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111655
--- Comment #13 from Alexander Monakov ---
> Then there is the MULT_EXPR x * x case
This is PR 111701.
It would be nice to clarify what "nonnegative" means in the contracts of this
family of functions, because it's ambiguous for NaNs and negat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112699
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112699
--- Comment #2 from Alexander Monakov ---
Sorry, even though GCC's limits.h is installed under include-fixed, it is
generated separately, not by the generic fixincludes mechanism. I was confused.
Priority: P3
Component: preprocessor
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
In the following snippet, the result of the ternary operator is (-1, cast to an
unsigned type), so the comparison yields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
--- Comment #8 from Alexander Monakov ---
Thanks, I can reproduce it. It is pretty tricky though. For instance, just
swapping the mov and the compare is enough to make it fast:
--- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611 +0300
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
--- Comment #9 from Alexander Monakov ---
... as does inserting a nop before the compare ¯\_(ツ)_/¯
--- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611 +0300
+++ d.out.ltrans0.ltrans.s 2023-12-01 18:53:04.909438690 +0300
@@ -743,
: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
At -O2 -mfma (x86) or -O3 (arm64) we fail to SLP-vectorize 'f', but succeed in
'g':
double f(double x[], long n)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902
--- Comment #22 from Alexander Monakov ---
Created attachment 55105
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55105&action=edit
patch 1/3
(In reply to Richard Biener from comment #21)
>
> Sounds reasonable. Though I wouldn't use GE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902
--- Comment #25 from Alexander Monakov ---
(In reply to Richard Biener from comment #24)
> As of the patch it looks good, I wonder if we want to check for OPTIMIZE_BOTH
> though since at least when no extra negations are required the contraction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902
--- Comment #26 from Alexander Monakov ---
> > Did you run into any of NON_LVALUE / C_MAYBE_CONST wrappings of the
> > multiplication btw?
>
> No, I'm not familiar with those, so I didn't try to construct corresponding
> testcases.
I had a loo
|UNCONFIRMED |RESOLVED
CC||amonakov at gcc dot gnu.org
--- Comment #1 from Alexander Monakov ---
There's a family of similar reports that "late" warnings do not observe this
pragma under LTO. I think it's best to ma
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80922
Alexander Monakov changed:
What|Removed |Added
CC||bruno at clisp dot org
--- Comment #
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109950
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109944
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109982
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110007
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109982
--- Comment #13 from Alexander Monakov ---
No, neither for fields nor for the complete object:
struct
__attribute__((aligned(64)))
S {
int i;
};
void f()
{
struct S s __attribute__((aligned(1))), *p = &s;
int *q = &s.i;
asm(""
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109982
--- Comment #15 from Alexander Monakov ---
For '--float' I think runtime differences are expected when you pass -m flags
that enable FMA, unless you also pass '-ffp-contract=off'.
For '--compiler-attributes' I'd suggest reporting only compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110052
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110053
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
||amonakov at gcc dot gnu.org
--- Comment #1 from Alexander Monakov ---
This is a correctness issue as well due to lack of SFENCE.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110069
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110052
--- Comment #5 from Alexander Monakov ---
There are other reasons why it's invalid. For instance, in a multi-threaded
program it could introduce a data race on assignment to foo->size inside of
'myrealloc' where the original program might have a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110087
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110089
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
1001 - 1100 of 1199 matches
Mail list logo