https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
Linus Torvalds changed:
What|Removed |Added
CC||torvalds@linux-foundation.o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
--- Comment #8 from Linus Torvalds ---
(In reply to Alexander Monakov from comment #7)
>
> Most likely the issue is that sout/sfrom are misaligned at runtime, while
> the vectorized code somewhere relies on them being sufficiently aligned for
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
--- Comment #10 from Linus Torvalds ---
(In reply to Richard Biener from comment #9)
>
> Note alignment has nothing to do with strict-aliasing (-fno-strict-aliasing
> you mean btw).
I obviously meant -fno-strict-aliasing, yes.
But I think it'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
--- Comment #11 from Linus Torvalds ---
(In reply to Linus Torvalds from comment #10)
>
> This particular code comes
> from some old version of zlib, and I can't test because I don't have the ARC
> background to make any sense of the gene
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
--- Comment #14 from Linus Torvalds ---
(In reply to Vineet Gupta from comment #13)
> Sorry the workaround proposed by Alexander doesn't seem to cure it (patch
> attached), outcome is the same
Vineet - it's not the ldd/std that is necessarily b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552
--- Comment #12 from Linus Torvalds ---
So it might be worth pointing explicitly to Vlastimil's email at
https://lore.kernel.org/all/2b857e20-5e3a-13ec-a0b0-1f69d2d04...@suse.cz/
which has annotated objdump output and seems to point to the a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552
--- Comment #30 from Linus Torvalds ---
(In reply to Richard Biener from comment #26)
> And yes, to IV optimization the gcov counter for the loop body is just
> another IV candidate that can be used, and in this case it allows to elide
> the oth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552
--- Comment #31 from Linus Torvalds ---
(In reply to Richard Biener from comment #26)
>
> Now, in principle we should have applied store-motion and not only PRE which
> would have avoided the issue, not tricking the RA into reloading the value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552
--- Comment #32 from Linus Torvalds ---
Brw, where does the -fprofile-update=single/atomic come from?
The kernel just uses
CFLAGS_GCOV:= -fprofile-arcs -ftest-coverage
for this case. So I guess 'single' is just the default value?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552
--- Comment #43 from Linus Torvalds ---
(In reply to Richard Biener from comment #42)
>
> I think if we want to avoid doing optimizations on gcov counters we should
> make them volatile.
Honestly, that sounds like the cleanest and safest opti
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552
--- Comment #47 from Linus Torvalds ---
(In reply to Richard Biener from comment #45)
> For user code
>
> volatile long long x;
> void foo () { x++; }
>
> emitting inc + adc with memory operands is only "incorrect" in re-ordering
> the subword
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
Bug ID: 106471
Summary: Strange code generation for __builtin_ctzl()
Product: gcc
Version: 12.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
--- Comment #1 from Linus Torvalds ---
Created attachment 53379
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53379&action=edit
Silly test-case as an attachment too
I expected just
rep bsfq %rdi, %rax
ret
from this, but
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
--- Comment #5 from Linus Torvalds ---
(In reply to Andrew Pinski from comment #2)
> The xor is needed because of an errata in some Intel cores.
The only errata I'm aware of is that tzcnt can act as tzcnt even when cpuid
doesn't enumerate it (s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471
--- Comment #6 from Linus Torvalds ---
Ahh, crossed comments.
(In reply to Andrew Pinski from comment #3)
> The xor is due to X86_TUNE_AVOID_FALSE_DEP_FOR_BMI setting:
>
> /* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency
>for bi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
Bug ID: 105930
Summary: Excessive stack spill generation on 32-bit x86
Product: gcc
Version: 12.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #1 from Linus Torvalds ---
Side note: it might be best to clarify that this is a regression specific to
gcc-12.
Gcc 11.3 doesn't have the problem, and generates code for this same test-case
with a stack frame of only 428 bytes. That
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #3 from Linus Torvalds ---
Created attachment 53123
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53123&action=edit
Mindless revert that fixes things for me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #4 from Linus Torvalds ---
So hey, since you guys use git now, I thought I might as well just bisect this.
Now, I have no idea what the best and most efficient way is to generate only
"cc1", so my bisection run was this unholy mess
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #5 from Linus Torvalds ---
(In reply to Linus Torvalds from comment #4)
>
> I'm not proud of that hacky thing, but since gcc documentation is written
> in sanskrit, and mere mortals can't figure it out, it's the best I could do.
A
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #8 from Linus Torvalds ---
(In reply to Roger Sayle from comment #7)
> Investigating. Adding -mno-stv the stack size reduces from 2612 to 428 (and
> on godbolt the number of assembler lines reduces from 6952 to 6203).
Thanks. Using
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #9 from Linus Torvalds ---
Looks like STV is "scalar to vector" and it should have been disabled
automatically by the -mno-avx flag anyway.
And the excessive stack usage was perhaps due to GCC preparing all those stack
slots for int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #10 from Linus Torvalds ---
(In reply to Roger Sayle from comment #7)
> Investigating. Adding -mno-stv the stack size reduces from 2612 to 428 (and
> on godbolt the number of assembler lines reduces from 6952 to 6203).
So now that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #12 from Linus Torvalds ---
(In reply to Jakub Jelinek from comment #11)
> Anyway, I think we need to understand what makes it spill that much more,
> and unfortunately the testcase is too large to find that out easily, I think
> we
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #14 from Linus Torvalds ---
(In reply to Samuel Neves from comment #13)
> Something simple like this -- https://godbolt.org/z/61orYdjK7 -- already
> exhibits the effect.
Yup.
That's a much better test-case. I think you should atta
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #21 from Linus Torvalds ---
(In reply to CVS Commits from comment #20)
>
> One might think
> that splitting early gives the register allocator more freedom to
> use available registers, but in practice the constraint
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #23 from Linus Torvalds ---
(In reply to Jakub Jelinek from comment #22)
>
> If the wider registers are narrowed before register allocation, it is just
> a pair like (reg:SI 123) (reg:SI 256) and it can be allowed anywhere.
That wa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #24 from Linus Torvalds ---
(In reply to Linus Torvalds from comment #23)
>
> And this now brings back my memory of the earlier similar discussion - it
> wasn't about DImode code generation, it was about bitfield code generation
> b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #28 from Linus Torvalds ---
(In reply to Roger Sayle from comment #27)
> This should now be fixed on both mainline and the GCC 12 release branch.
Thanks everybody.
Looks like the xchg optimization isn't in the gcc-12 release branch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921
--- Comment #2 from Linus Torvalds ---
(In reply to Jakub Jelinek from comment #1)
> Bisection points to r12-5301-g045206450386bcd774db3bde0c696828402361c6
> making the problem go away,
Well, that certainly explains why I can't see the problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921
--- Comment #3 from Linus Torvalds ---
(In reply to Linus Torvalds from comment #2)
>
> So we could make our workaround option be something like
>
>config GCC_ASM_GOTO_WORKAROUND
> def_bool y
> depends on CC_IS_GCC && GCC_VERSION
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921
--- Comment #5 from Linus Torvalds ---
(In reply to Linus Torvalds from comment #2)
>
> So we could make our workaround option be something like
>
>config GCC_ASM_GOTO_WORKAROUND
> def_bool y
> depends on CC_IS_GCC && GCC_VERSION
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111901
Bug ID: 111901
Summary: Apparently bogus CSE of inline asm with memory clobber
Product: gcc
Version: 13.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Co
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111901
--- Comment #3 from Linus Torvalds ---
(In reply to Andrew Pinski from comment #1)
> I suspect without an input, the cse will happen as there is no other writes
> in the loop.
Yes, it looks to me like the CSE simply didn't think of the memory c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119279
--- Comment #2 from Linus Torvalds ---
(In reply to Andrew Pinski from comment #1)
>
> Why is the call needs to be done in the inline-asm?
Typically it's the fallback alternative for when the primary inline asm doesn't
work
Ie the "real" asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119279
--- Comment #6 from Linus Torvalds ---
(In reply to Jakub Jelinek from comment #5)
> Call instructions are normally valid anywhere in the function, including
> prologue and epilogue, even with frame pointers.
Sure, the call instruction actually
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119279
--- Comment #4 from Linus Torvalds ---
(In reply to Richard Biener from comment #3)
> I think
>
> asm ("" : : "g" (__builtin_frame_address_(0)))
>
> and using that input as frame pointer looks spot-on semantically, is that
> what you are actua
37 matches
Mail list logo