https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116738
--- Comment #7 from Petr ---
The simplified test case looks good except for a missing return :)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116738
--- Comment #3 from Petr ---
Maybe marking it as confirmed would be appropriate then?
I think as a workaround it would be better to not constant fold code that GCC
cannot compute properly - that would mean properly calculating the values at
run
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kobalicek.petr at gmail dot com
Target Milestone: ---
GCC incorrectly optimizes x86 intrinsics, which have a defined operation at the
ISA level. It seems that the problem happens when a value is known at compile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699
--- Comment #17 from Petr ---
Guys thanks a lot for your feedback.
Is the may_alias annotation guaranteed to behave as expected in the future
versions of GCC too, or it's just too much UB that it's better to do unaligned
reads with memcpy?
Wha
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699
--- Comment #15 from Petr ---
Unfortunately GCC doesn't report any issues even with `-Wstrict-aliasing=1`.
BTW now I know I must use the may_alias attribute to my satisfaction, and this
is what I'm gonna do, however, from user perspective I'm n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699
--- Comment #12 from Petr ---
Is there a way to diagnose this? To tell GCC to report transformations that
basically cause wrong results returned?
In my code base, I have unaligned memory loads/stores abstracted, so I can
implement whatever comp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699
--- Comment #10 from Petr ---
Well, the problem is, that when you compile it with "-fsanitize=undefined" - it
won't report any undefined behavior, and the function would return the expected
value.
I even tried to make everything constexpr - and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699
--- Comment #8 from Petr ---
My only problem is that A returns a different value compared to B, C, and D:
uint32_t test_u32_a() {
char array[16] {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
writeU64be(array + 6, 0xAABBCCDDEEFF1213);
ret
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699
--- Comment #6 from Petr ---
For now I have disabled unaligned load/store optimizations in my projects when
dealing with GCC 11 and upwards.
I still think that GCC is wrong in this case regardless of strict aliasing. The
code in func_u32() is e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699
--- Comment #4 from Petr ---
Additional test case:
#include
#include
typedef uint32_t __attribute__((__aligned__(1))) UnalignedUInt32;
typedef uint64_t __attribute__((__aligned__(1))) UnalignedUInt64;
uint32_t byteswap32(uint32_t x) noexcep
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699
--- Comment #3 from Petr ---
BTW this almost seems like an optimizer bug, because if you compile the code
without optimizations with GCC 11 (or with -O1) it also returns the expected
value - only optimized compilation with GCC 11 returns the wro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699
--- Comment #2 from Petr ---
If you compile this with clang the function test_u32() will corretly return the
expected 0xBBCCDDEE and not 0x0708090A.
If you compile with older GCC, like GCC 10, the test would also return
0xBBCCDDEE. Only GCC-11
: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kobalicek.petr at gmail dot com
Target Milestone: ---
I have found a strange issue. When I use __attribute__((aligned(1)) on a type
to essentially annotate its lower alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77287
--- Comment #6 from Petr ---
Yes, the code is not really doing anything useful, I only wrote it to
demonstrate the spills problem. Clang actually outsmarted me by removing half
of the code :)
I think this issue can be closed, I cannot repro this
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87105
--- Comment #16 from Petr ---
Thanks a lot! I hope much more code would benefit from this change.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87105
--- Comment #6 from Petr ---
I think the test-case can even be simplified to something like this:
#include
#include
struct Point {
double x, y;
void reset(double x, double y) {
this->x = x;
this->y = y;
}
};
void f1(Point* p,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87105
--- Comment #4 from Petr ---
I think this code is vectorizable without --fast-math. However, it seems that
once a min/max (or something else) is kept scalar it poisons the rest of the
code.
The following code works perfectly (scalar):
```
#incl
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kobalicek.petr at gmail dot com
Target Milestone: ---
GCC is unable to autovectorize the following code. It seems that it doesn't
like min/max, but I'm not entirely sure. I stripped the code off my project
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81870
--- Comment #2 from Petr ---
I see, so if I understand it correctly then:
1. `__builtin_assume_aligned()` should be used to promote the type to a higher
than natural alignment, for example 16 bytes for easier auto-vectorization.
2. `__attribute
erity: normal
Priority: P3
Component: sanitizer
Assignee: unassigned at gcc dot gnu.org
Reporter: kobalicek.petr at gmail dot com
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79880
--- Comment #6 from Petr ---
Ok, that's fair enough. I didn't know GCC needs an additional option to switch
to fully compatible Intel syntax. The code that I posted works fine in clang,
so sorry about that.
And yes, the instruction will #UD, but
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79880
--- Comment #4 from Petr ---
In this case, DWORD PTR is redundant, nasm and yasm is fine with the syntax I
posted as well.
It's a simplified test just to show that it won't pass.
Try:
__asm(".intel_syntax\n"
"vpgatherdd xmm4, dword pt
Component: inline-asm
Assignee: unassigned at gcc dot gnu.org
Reporter: kobalicek.petr at gmail dot com
Target Milestone: ---
I'm unable to encode `vpgatherdd xmm, mem, xmm` instruction in inline asm:
void test() {
__asm(".intel_syntax\n"
"vpgather
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79830
--- Comment #4 from Petr ---
I think the test-case can be simplified to the following code. It still suffers
from the same issues as mentioned above.
#include
#if defined(_MSC_VER)
# include
#else
# include
#endif
void transform(double* dst
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79830
--- Comment #3 from Petr ---
Sorry for misunderstanding, I really read initially that you replaced the exit
condition in the sample code :)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79830
--- Comment #2 from Petr ---
I'm not sure I follow with the exit test. I mean the code should be correct as
each point has x|y coord, which is two doubles, so length 8 means 16 doubles (I
converted from my production code into a simpler form that
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kobalicek.petr at gmail dot com
Target Milestone: ---
It seems that GCC tries very hard to optimize loops, but in my case it's
counterproductive. I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77287
--- Comment #4 from Petr ---
Adding -fschedule-insns is definitely a huge improvement in this case. I wonder
why this doesn't happen by default at -O2 and -Os, as it really improves things
and makes shorter output, or it's just in this particular
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77287
--- Comment #2 from Petr ---
With '-mtune=intel' the push/pop sequence is gone, but YMM register management
remains the same - 24 memory accesses more than clang.
: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kobalicek.petr at gmail dot com
Target Milestone: ---
A simple function (artificial code):
#include
int fn(
const int* px, const int* py,
const int* pz, const int* pw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70708
--- Comment #3 from Petr ---
Is there any workaround guys?
I was looking for some built-in that would allow me just cast `double` to
`__m128d` without going through `_mm_set_sd()`, but leaving the high part
undefined.
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kobalicek.petr at gmail dot com
Target Milestone: ---
The ABI already uses XMM registers for floating point operations. Compare the
following two snippets:
double MyMinV1(double a, double b) {
return a < b
32 matches
Mail list logo