[Bug target/116738] Constant folding of _mm_min_ss and _mm_max_ss is wrong

2024-09-17 Thread kobalicek.petr at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116738 --- Comment #7 from Petr --- The simplified test case looks good except for a missing return :)

[Bug target/116738] Constant folding of _mm_min_ss and _mm_max_ss is wrong

2024-09-17 Thread kobalicek.petr at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116738 --- Comment #3 from Petr --- Maybe marking it as confirmed would be appropriate then? I think as a workaround it would be better to not constant fold code that GCC cannot compute properly - that would mean properly calculating the values at run

[Bug c++/116738] New: Constant folding of _mm_min_ss and _mm_max_ss is wrong

2024-09-16 Thread kobalicek.petr at gmail dot com via Gcc-bugs
Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kobalicek.petr at gmail dot com Target Milestone: --- GCC incorrectly optimizes x86 intrinsics, which have a defined operation at the ISA level. It seems that the problem happens when a value is known at compile

[Bug c++/103699] Reading or writing a constant unaligned value is wrongly optimized causing an incorrect result (GCC-11 and up)

2021-12-14 Thread kobalicek.petr at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699 --- Comment #17 from Petr --- Guys thanks a lot for your feedback. Is the may_alias annotation guaranteed to behave as expected in the future versions of GCC too, or it's just too much UB that it's better to do unaligned reads with memcpy? Wha

[Bug c++/103699] Reading or writing a constant unaligned value is wrongly optimized causing an incorrect result (GCC-11 and up)

2021-12-14 Thread kobalicek.petr at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699 --- Comment #15 from Petr --- Unfortunately GCC doesn't report any issues even with `-Wstrict-aliasing=1`. BTW now I know I must use the may_alias attribute to my satisfaction, and this is what I'm gonna do, however, from user perspective I'm n

[Bug c++/103699] Reading or writing a constant unaligned value is wrongly optimized causing an incorrect result (GCC-11 and up)

2021-12-14 Thread kobalicek.petr at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699 --- Comment #12 from Petr --- Is there a way to diagnose this? To tell GCC to report transformations that basically cause wrong results returned? In my code base, I have unaligned memory loads/stores abstracted, so I can implement whatever comp

[Bug c++/103699] Reading or writing a constant unaligned value is wrongly optimized causing an incorrect result (GCC-11 and up)

2021-12-14 Thread kobalicek.petr at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699 --- Comment #10 from Petr --- Well, the problem is, that when you compile it with "-fsanitize=undefined" - it won't report any undefined behavior, and the function would return the expected value. I even tried to make everything constexpr - and

[Bug c++/103699] Reading or writing a constant unaligned value is wrongly optimized causing an incorrect result (GCC-11 and up)

2021-12-13 Thread kobalicek.petr at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699 --- Comment #8 from Petr --- My only problem is that A returns a different value compared to B, C, and D: uint32_t test_u32_a() { char array[16] {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; writeU64be(array + 6, 0xAABBCCDDEEFF1213); ret

[Bug c++/103699] Reading or writing a constant unaligned value is wrongly optimized causing an incorrect result (GCC-11 and up)

2021-12-13 Thread kobalicek.petr at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699 --- Comment #6 from Petr --- For now I have disabled unaligned load/store optimizations in my projects when dealing with GCC 11 and upwards. I still think that GCC is wrong in this case regardless of strict aliasing. The code in func_u32() is e

[Bug c++/103699] Reading or writing a constant unaligned value is wrongly optimized causing an incorrect result (GCC-11 and up)

2021-12-13 Thread kobalicek.petr at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699 --- Comment #4 from Petr --- Additional test case: #include #include typedef uint32_t __attribute__((__aligned__(1))) UnalignedUInt32; typedef uint64_t __attribute__((__aligned__(1))) UnalignedUInt64; uint32_t byteswap32(uint32_t x) noexcep

[Bug c++/103699] Reading or writing a constant unaligned value is wrongly optimized causing an incorrect result (GCC-11 and up)

2021-12-13 Thread kobalicek.petr at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699 --- Comment #3 from Petr --- BTW this almost seems like an optimizer bug, because if you compile the code without optimizations with GCC 11 (or with -O1) it also returns the expected value - only optimized compilation with GCC 11 returns the wro

[Bug c++/103699] Reading or writing a constant unaligned value is wrongly optimized causing an incorrect result (GCC-11 and up)

2021-12-13 Thread kobalicek.petr at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103699 --- Comment #2 from Petr --- If you compile this with clang the function test_u32() will corretly return the expected 0xBBCCDDEE and not 0x0708090A. If you compile with older GCC, like GCC 10, the test would also return 0xBBCCDDEE. Only GCC-11

[Bug c++/103699] New: Reading or writing unaligned integers is wrongly optimized (GCC-11 and up)

2021-12-13 Thread kobalicek.petr at gmail dot com via Gcc-bugs
: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kobalicek.petr at gmail dot com Target Milestone: --- I have found a strange issue. When I use __attribute__((aligned(1)) on a type to essentially annotate its lower alignment

[Bug target/77287] Much worse code generated compared to clang (stack alignment and spills)

2021-08-24 Thread kobalicek.petr at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77287 --- Comment #6 from Petr --- Yes, the code is not really doing anything useful, I only wrote it to demonstrate the spills problem. Clang actually outsmarted me by removing half of the code :) I think this issue can be closed, I cannot repro this

[Bug tree-optimization/87105] Autovectorization [X86, SSE2, AVX2, DoublePrecision]

2018-10-26 Thread kobalicek.petr at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87105 --- Comment #16 from Petr --- Thanks a lot! I hope much more code would benefit from this change.

[Bug tree-optimization/87105] Autovectorization [X86, SSE2, AVX2, DoublePrecision]

2018-08-26 Thread kobalicek.petr at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87105 --- Comment #6 from Petr --- I think the test-case can even be simplified to something like this: #include #include struct Point { double x, y; void reset(double x, double y) { this->x = x; this->y = y; } }; void f1(Point* p,

[Bug tree-optimization/87105] Autovectorization [X86, SSE2, AVX2, DoublePrecision]

2018-08-26 Thread kobalicek.petr at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87105 --- Comment #4 from Petr --- I think this code is vectorizable without --fast-math. However, it seems that once a min/max (or something else) is kept scalar it poisons the rest of the code. The following code works perfectly (scalar): ``` #incl

[Bug c++/87105] New: Autovectorization [X86, SSE2, AVX2, DoublePrecision]

2018-08-25 Thread kobalicek.petr at gmail dot com
Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kobalicek.petr at gmail dot com Target Milestone: --- GCC is unable to autovectorize the following code. It seems that it doesn't like min/max, but I'm not entirely sure. I stripped the code off my project

[Bug sanitizer/81870] -fsanitize=undefined doesn't pay attention to __builtin_assume_aligned()

2017-08-16 Thread kobalicek.petr at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81870 --- Comment #2 from Petr --- I see, so if I understand it correctly then: 1. `__builtin_assume_aligned()` should be used to promote the type to a higher than natural alignment, for example 16 bytes for easier auto-vectorization. 2. `__attribute

[Bug sanitizer/81870] New: -fsanitize=undefined doesn't pay attention to __builtin_assume_aligned()

2017-08-16 Thread kobalicek.petr at gmail dot com
erity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: kobalicek.petr at gmail dot com CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gn

[Bug inline-asm/79880] Gcc refuses to encode vpgatherdd instruction (x86-64)

2017-03-06 Thread kobalicek.petr at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79880 --- Comment #6 from Petr --- Ok, that's fair enough. I didn't know GCC needs an additional option to switch to fully compatible Intel syntax. The code that I posted works fine in clang, so sorry about that. And yes, the instruction will #UD, but

[Bug inline-asm/79880] Gcc refuses to encode vpgatherdd instruction (x86-64)

2017-03-06 Thread kobalicek.petr at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79880 --- Comment #4 from Petr --- In this case, DWORD PTR is redundant, nasm and yasm is fine with the syntax I posted as well. It's a simplified test just to show that it won't pass. Try: __asm(".intel_syntax\n" "vpgatherdd xmm4, dword pt

[Bug inline-asm/79880] New: Gcc refuses to encode vpgatherdd instruction (x86-64)

2017-03-05 Thread kobalicek.petr at gmail dot com
Component: inline-asm Assignee: unassigned at gcc dot gnu.org Reporter: kobalicek.petr at gmail dot com Target Milestone: --- I'm unable to encode `vpgatherdd xmm, mem, xmm` instruction in inline asm: void test() { __asm(".intel_syntax\n" "vpgather

[Bug tree-optimization/79830] GCC generates counterproductive code surrounding very simple loops (improvement request)

2017-03-04 Thread kobalicek.petr at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79830 --- Comment #4 from Petr --- I think the test-case can be simplified to the following code. It still suffers from the same issues as mentioned above. #include #if defined(_MSC_VER) # include #else # include #endif void transform(double* dst

[Bug tree-optimization/79830] GCC generates counterproductive code surrounding very simple loops (improvement request)

2017-03-03 Thread kobalicek.petr at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79830 --- Comment #3 from Petr --- Sorry for misunderstanding, I really read initially that you replaced the exit condition in the sample code :)

[Bug tree-optimization/79830] GCC generates counterproductive code surrounding very simple loops (improvement request)

2017-03-03 Thread kobalicek.petr at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79830 --- Comment #2 from Petr --- I'm not sure I follow with the exit test. I mean the code should be correct as each point has x|y coord, which is two doubles, so length 8 means 16 doubles (I converted from my production code into a simpler form that

[Bug c++/79830] New: GCC generates counterproductive code surrounding very simple loops (improvement request)

2017-03-03 Thread kobalicek.petr at gmail dot com
Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kobalicek.petr at gmail dot com Target Milestone: --- It seems that GCC tries very hard to optimize loops, but in my case it's counterproductive. I

[Bug rtl-optimization/77287] Much worse code generated compared to clang (stack alignment and spills)

2016-08-20 Thread kobalicek.petr at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77287 --- Comment #4 from Petr --- Adding -fschedule-insns is definitely a huge improvement in this case. I wonder why this doesn't happen by default at -O2 and -Os, as it really improves things and makes shorter output, or it's just in this particular

[Bug target/77287] Much worse code generated compared to clang (stack alignment and spills)

2016-08-18 Thread kobalicek.petr at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77287 --- Comment #2 from Petr --- With '-mtune=intel' the push/pop sequence is gone, but YMM register management remains the same - 24 memory accesses more than clang.

[Bug c++/77287] New: Much worse code generated compared to clang (stack alignment and spills)

2016-08-18 Thread kobalicek.petr at gmail dot com
: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kobalicek.petr at gmail dot com Target Milestone: --- A simple function (artificial code): #include int fn( const int* px, const int* py, const int* pz, const int* pw

[Bug target/70708] Suboptimal code generated when using _mm_set_sd (X64)

2016-04-18 Thread kobalicek.petr at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70708 --- Comment #3 from Petr --- Is there any workaround guys? I was looking for some built-in that would allow me just cast `double` to `__m128d` without going through `_mm_set_sd()`, but leaving the high part undefined.

[Bug c++/70708] New: Suboptimal code generated when using _mm_set_sd (X64)

2016-04-17 Thread kobalicek.petr at gmail dot com
Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kobalicek.petr at gmail dot com Target Milestone: --- The ABI already uses XMM registers for floating point operations. Compare the following two snippets: double MyMinV1(double a, double b) { return a < b