Almost an order of magnitude faster __udimodti4() for AMD64

2020-08-10 Thread Stefan Kanthak
Hi @ll, I don't use GCC, so I don't know whether there's a benchmark for __udivmodti4() and/or __udivmoddi4() for AMD64 and i386 processors. If you have one: get my "slow" __udivmodti4() from and run the benchmark, then my fast __udivmodti

Peephole optimisation: isWhitespace()

2020-08-14 Thread Stefan Kanthak
x27;) shreax, cl ; eax >>= (c % ' ') xoredx, edx cmp ecx, 33 ; CF = c <= ' ' adcedx, edx ; edx = (c <= ' ') andeax, edx ret regards Stefan Kanthak

Re: Peephole optimisation: isWhitespace()

2020-08-16 Thread Stefan Kanthak
"Nathan Sidwell" wrote: > On 8/14/20 12:43 PM, Stefan Kanthak wrote: >> Hi @ll, >> >> in his ACM queue article <https://queue.acm.org/detail.cfm?id=3372264>, >> Matt Godbolt used the function >> >> | bool isWhitespace(char c)

Re: Peephole optimisation: isWhitespace()

2020-08-17 Thread Stefan Kanthak
"Nathan Sidwell" > On 8/16/20 9:54 AM, Stefan Kanthak wrote: >> "Nathan Sidwell" wrote: [...] >>> Have you benchmarked it? >> >> Of course! Did you? [...] > you seem very angry about being asked for data. As much as you hallucinated

Re: Peephole optimisation: isWhitespace()

2020-08-17 Thread Stefan Kanthak
"Allan Sandfeld Jensen" wrote: > On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote: >> Hi @ll, >> >> in his ACM queue article <https://queue.acm.org/detail.cfm?id=3372264>, >> Matt Godbolt used the function >> >> | b

Re: Peephole optimisation: isWhitespace()

2020-08-24 Thread Stefan Kanthak
"Richard Biener" wrote: > On Mon, Aug 17, 2020 at 7:09 PM Stefan Kanthak > wrote: >> >> "Allan Sandfeld Jensen" wrote: >> >> > On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote: >> >> Hi @ll, >> >>

Re: Peephole optimisation: isWhitespace()

2020-08-24 Thread Stefan Kanthak
"Richard Biener" wrote: > On Mon, Aug 24, 2020 at 1:22 PM Stefan Kanthak > wrote: >> >> "Richard Biener" wrote: >> >> > On Mon, Aug 17, 2020 at 7:09 PM Stefan Kanthak >> > wrote: >> >> >> >> "Al

Re: Peephole optimisation: isWhitespace()

2020-08-25 Thread Stefan Kanthak
I wrote: > "Richard Biener" wrote: [...] >> Whether or not the branch is predicted taken does not matter, what >> matters is that the continuation is not data dependent on the branch >> target computation and thus can execute in parallel to it. > > My benchmark shows that this doesn't matter!

Missed optimisation in __udivmoddi4 of libgcc2

2020-09-13 Thread Stefan Kanthak
libgcc2 provides "double-word" division as __udivmoddi4() The following part of its source | UWtype d0, d1, n0, n1, n2; | UWtype b, bm; ... | count_leading_zeros (bm, d1); | if (bm == 0) ... | else | { | UWtype m1, m0; | /* Normalize. */ | | b = W_TYPE_SIZE - bm;

UB or !UB? Plus poor register allocation

2020-10-01 Thread Stefan Kanthak
The following source implements the __absv?i2() functions (see <https://gcc.gnu.org/onlinedocs/gccint/Integer-library-routines.html>) for 32-bit, 64-bit and 128-bit integers in 3 different ways: --- ub_or_!ub.c --- // Copyleft 2014-2020, Stefan Kanthak #ifdef __amd64__ __int128_t __a

[Patch] Overflow-trapping integer arithmetic routines7code: bloated and slooooow

2020-10-05 Thread Stefan Kanthak
The implementation of the functions __absv?i2(), __addv?i3() etc. for trapping integer overflow provided in libgcc2.c is rather bad. Same for __cmp?i2() and __ucmp?i2() GCC creates awful to horrible code for them (at least for AMD64 and i386 processors): see

[__mulvti3] register allocator plays shell game

2020-10-25 Thread Stefan Kanthak
, 63 cmp r8, rsi jne __mulvti3+0x48+65-31 cmp r9, rcx jne __mulvti3+0xa0+65-31 mov rax, rdi imul rdx ret ... not amused Stefan Kanthak

Re: [__mulvti3] register allocator plays shell game

2020-10-26 Thread Stefan Kanthak
Richard Biener wrote: > On Sun, Oct 25, 2020 at 8:37 PM Stefan Kanthak > wrote: >> >> Hi, >> >> for the AMD64 alias x86_64 platform and the __int128_t [DW]type, >> the first few lines of the __mulvDI3() function from libgcc2.c >> >

Re: [__mulvti3] register allocator plays shell game

2020-10-27 Thread Stefan Kanthak
Richard Biener wrote: > On Tue, Oct 27, 2020 at 12:01 AM Stefan Kanthak > wrote: >> >> Richard Biener wrote: >> >>> On Sun, Oct 25, 2020 at 8:37 PM Stefan Kanthak >>> wrote: >>>> >>>> Hi, >>>> >>>> fo

[libgcc2.c] Implementation of __bswapsi2()

2020-11-12 Thread Stefan Kanthak
nt w) { return (v >> (31 & w)) | (v << (31 & -w)); } int __bswapsi2 (int u) // should better be unsigned __bswapsi2 (unsigned u)! { return __rotlsi3 (u & 0xff00ff00, 8) | __rotrsi3 (u & 0x00ff00ff, 8); } Stefan KanthaK PS: reimplementing __bswapdi2(

Poor code generation/optimisation in all versions of GCC x86-64 and x86-32

2018-11-05 Thread Stefan Kanthak
inverse "mov rdx, rcx". I also wonder why a shld is created here: at least for "n += n;" I expect a more straightforward add rax, rax adc rdx, rdx regards Stefan Kanthak PS: of course GCC x86-32 exhibits the same flaws with int64_t!

Bug in divmodhi4(), plus poor inperformant code

2018-12-04 Thread Stefan Kanthak
Hi @ll, libgcc's divmodhi4() function has an obvious bug; additionally it shows rather poor inperformant code: two of the three conditions tested in the first loop should clearly moved outside the loop! divmodsi4() shows this inperformant code too! regards Stefan Kanthak --- divmod

Re: Bug in divmodhi4(), plus poor inperformant code

2018-12-04 Thread Stefan Kanthak
more sense, and a few targets do that. Moving 2 of 3 conditions from the loop is not an optimisation, but a necessity! In other words: why test 3 conditions in every pass of the loop when you need to test only 1 condition inside the loop, and the other 2 outside/before the loop? regards Stefa

Re: Bug in divmodhi4(), plus poor inperformant code

2018-12-06 Thread Stefan Kanthak
"Segher Boessenkool" wrote: > On Wed, Dec 05, 2018 at 02:19:14AM +0100, Stefan Kanthak wrote: >> "Paul Koning" wrote: >> >> > Yes, that's a rather nasty cut & paste error I made. >> >> I suspected that. >> Replacing &g

Optimiser failure for ternary foo == 0L ? NULL : bar;

2021-07-17 Thread Stefan Kanthak
e: 66 90 xchg %ax,%ax 10: 31 c0 xor%eax,%eax 12: c3 ret not amused Stefan Kanthak

Are some builtin functions (for example log() vs. sqrt()) more equal than others?

2021-07-30 Thread Stefan Kanthak
on log(sqrt(5.0) * 0.5 + 0.5)! NOT amused Stefan Kanthak

Re: Are some builtin functions (for example log() vs. sqrt()) more equal than others?

2021-07-30 Thread Stefan Kanthak
Joseph Myers wrote: > None of these are valid constant expressions as defined by the standard > (constant expressions cannot involve evaluated function calls). That's why I ask specifically why GCC bugs on log(log(...)), but not on log(sqrt(...) ...)! GCC also accepts following initializers an

Re: Are some builtin functions (for example log() vs. sqrt()) more equal than others?

2021-07-30 Thread Stefan Kanthak
"Joseph Myers" wrote: > On Fri, 30 Jul 2021, Stefan Kanthak wrote: > >> Joseph Myers wrote: >> >> > None of these are valid constant expressions as defined by the standard >> > (constant expressions cannot involve evaluated function calls). >

Suboptimal code generated for __buitlin_rint on AMD64 without SS4_4.1

2021-08-05 Thread Stefan Kanthak
Hi, targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the following code (12 instructions using 51 bytes, plus 4 quadwords using 32 bytes) for __builtin_rint() when -msse4.1 is NOT given: .text 0: f2 0f 10 15 10 00 00 00 movsd .LC1(%rip), %xmm2

Suboptimal code generated for __buitlin_ceil on AMD64 without SS4_4.1

2021-08-05 Thread Stefan Kanthak
Hi, targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the following code (17 instructions using 78 bytes, plus 6 quadwords using 48 bytes) for __builtin_ceil() when -msse4.1 is NOT given: .text 0: f2 0f 10 15 10 00 00 00 movsd .LC1(%rip), %xmm2

Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-05 Thread Stefan Kanthak
Hi, targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the following code (13 instructions using 57 bytes, plus 4 quadwords using 32 bytes) for __builtin_trunc() when -msse4.1 is NOT given: .text 0: f2 0f 10 15 10 00 00 00 movsd .LC1(%rip), %xmm2

Suboptimal code generated for __buitlin_floor on AMD64 without SS4_4.1

2021-08-05 Thread Stefan Kanthak
Hi, targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the following code (19 instructions using 86 bytes, plus 6 quadwords using 48 bytes) for __builtin_floor() when -msse4.1 is NOT given: .text 0: f2 0f 10 15 10 00 00 00 movsd .LC1(%rip), %xmm2

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-05 Thread Stefan Kanthak
Gabriel Paubert wrote: > On Thu, Aug 05, 2021 at 09:25:02AM +0200, Stefan Kanthak wrote: >> Hi, >> >> targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the >> following code (13 instructions using 57 bytes, plus 4 quadwords >> using 32 bytes) for _

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Stefan Kanthak
Gabriel Paubert wrote: > Hi, > > On Thu, Aug 05, 2021 at 01:58:12PM +0200, Stefan Kanthak wrote: >> Gabriel Paubert wrote: >> >> >> > On Thu, Aug 05, 2021 at 09:25:02AM +0200, Stefan Kanthak wrote: >> >> .intel_

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Stefan Kanthak
Michael Matz wrote: > Hello, > > On Fri, 6 Aug 2021, Stefan Kanthak wrote: > >> For -ffast-math, where the sign of -0.0 is not handled and the spurios >> invalid floating-point exception for |argument| >= 2**63 is acceptable, > > This claim would need to be p

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Stefan Kanthak
Gabriel Paubert wrote: > On Fri, Aug 06, 2021 at 02:43:34PM +0200, Stefan Kanthak wrote: >> Gabriel Paubert wrote: >> >> > Hi, >> > >> > On Thu, Aug 05, 2021 at 01:58:12PM +0200, Stefan Kanthak wrote: [...] >> >> The whole idea

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Stefan Kanthak
Richard Biener wrote: > On August 6, 2021 4:32:48 PM GMT+02:00, Stefan Kanthak > wrote: >>Michael Matz wrote: >>> Btw, have you made speed measurements with your improvements? >> >>No. [...] >>If the constant happens to be present in L1 cache, it MAY lo

Optimizer failure

2021-08-07 Thread Stefan Kanthak
Hi, for the function (really: ternary expressions) int dummy(int x) { #ifdef VARIANT x < 0 ? --x : x > 0 ? ++x : 0; #else x < 0 ? --x : x > 0 ? ++x : x; #endif } GCC 10.2.0 generates the following code targeting AMD64: testl %edi, %edi js .L0 leal1(%rd

Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-07 Thread Stefan Kanthak
Joseph Myers wrote: > On Fri, 6 Aug 2021, Stefan Kanthak wrote: PLEASE DON'T STRIP ATTRIBUTION LINES: I did not write the following paragraph! >> > I don't know what the standard says about NaNs in this case, I seem to >> > remember that arithmetic instructions

Superfluous branches due to insufficient flow analysis

2021-08-13 Thread Stefan Kanthak
Hi, compile the following naive implementation of nextafter() for AMD64: JFTR: ignore the aliasing casts, they don't matter here! $ cat repro.c double nextafter(double from, double to) { if (to != to) return to;// to is NAN if (from != from) return from; //

Re: Superfluous branches due to insufficient flow analysis

2021-08-14 Thread Stefan Kanthak
"Gabriel Ravier" wrote: Please don't FULL QUOTE! > On 8/13/21 8:58 PM, Stefan Kanthak wrote: >> Hi, >> >> compile the following naive implementation of nextafter() for AMD64: >> >> JFTR: ignore the aliasing casts, they don't matter here!

Re: 3rd deficiency (was: Superfluous branches due to insufficient flow analysis)

2021-08-14 Thread Stefan Kanthak
Gabriel Ravier wrote: Independent from the defunct flow analysis in the presence of NaNs, my example demonstrates another minor deficiency: know thy instruction set! See the comments in the assembly below. > On 8/13/21 8:58 PM, Stefan Kanthak wrote: >> Hi, >> >> compil

On(c)e more: optimizer failure

2021-08-21 Thread Stefan Kanthak
Hi, the following snippet is from the nextafter() function of --- repro.c --- #define Zero 0.0 double nextafter(double argx, double argy) { double z = argx; if (isnan(argx) || isnan(argy)) return argx + argy; if (argx == argy) return argx;

Re: On(c)e more: optimizer failure

2021-08-21 Thread Stefan Kanthak
nt: https://godbolt.org/z/1ra7zcsnd Replace if (isnan(argx) || isnan(argy)) return argx + argy; with if ((argx != argx) || (argy != argy)) return argx + argy; then feed the changed snippet to compiler explorer again, with and without -ffast-math Stefan > --matt > > On Sat, Aug

Re: On(c)e more: optimizer failure

2021-08-21 Thread Stefan Kanthak
Jakub Jelinek wrote: > On Sat, Aug 21, 2021 at 09:40:16PM +0200, Stefan Kanthak wrote: >> > I believe your example doesn't take into account that the values can be NaN >> > which compares false in all situations. >> >> That's a misbelief! >> P

Re: On(c)e more: optimizer failure

2021-08-22 Thread Stefan Kanthak
Gabriel Ravier wrote: > On 8/21/21 10:19 PM, Stefan Kanthak wrote: >> Jakub Jelinek wrote: [...] >>> GCC doesn't do value range propagation of floating point values, not even >>> the special ones like NaNs, infinities, +/- zeros etc., and without that the &

Re: On(c)e more: optimizer failure

2021-08-23 Thread Stefan Kanthak
Gabriel Ravier wrote: > On 8/22/21 11:22 PM, Stefan Kanthak wrote: [ 2bugzilla | !2bugzilla ] >> You (and everybody else) if free to use GCC bugzilla. >> Everybody and me is but also free NOT to use GCC bugzilla. >> >> Stefan > > Yes, you are free not

Re: On(c)e more: optimizer failure

2021-08-23 Thread Stefan Kanthak
Gabriel Ravier wrote: > On 8/23/21 3:46 PM, Stefan Kanthak wrote: >> JFTR: do you consider your wild speculations to be on-topic here? > > I suppose I should apologize: I did not intend to make any accusations > here. No need to, I can stand a little heat. [...] > I

Re: On(c)e more: optimizer failure

2021-08-27 Thread Stefan Kanthak
Manuel López-Ibáñez wrote: > FWIW: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24021 Thanks. So this bug may soon have a driver's license in some countries... One more for the road: $ cat wtf.c double wtf(double x) { return sqrt(x * x); // can the square ever be negative? } $ gcc -m64 -o-

B^HDEAD code generation (i386)

2023-01-09 Thread Stefan Kanthak
pop esi pop edi pop ebp ret .L9: mov ebx, edi # Ouch: GCC likes to play shell games! mov ecx, esi # mov edx, ebx # mov eax, ecx # pop ebx pop esi pop

B^HDEAD code generation (AMD64)

2023-01-09 Thread Stefan Kanthak
re's no need to modify ECX! cmovne rdx, rax cmovne rax, rsi ret .L9: mov rax, rsi mov rdx, rdi .L1: ret .L14: mov r8, r9 xor r9d, r9d mov rcx, r8 jmp .L4 20 superfluous instructio

EPIC optimiser failures (i386)

2023-01-09 Thread Stefan Kanthak
sub eax, DWORD PTR [esp+4] .endif setoah setzal sub al, ah # al = ZF - OF .if 0 cbw cwde .else movsx eax, al .endif ret Stefan Kanthak

Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Stefan Kanthak
ret .end JFTR: dependent on the magnitude of the numbers and the processor it MIGHT be better to omit comparison and branch: there's a trade-öff between the latency of the (un-pipelined) division instruction and the latency of the conditional branch due to misprediction. Stefan Kanthak

Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Stefan Kanthak
LIU Hao wrote: >在 2023/1/9 20:20, Stefan Kanthak 写道: >> Hi, >> >> GCC (and other C compilers too) support the widening multiplication >> of i386/AMD64 processors, but DON'T support their narrowing division: >> >> > > QWORD-DWORD division would c

Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Stefan Kanthak
"Paul Koning" wrote: >> On Jan 9, 2023, at 7:20 AM, Stefan Kanthak wrote: >> >> Hi, >> >> GCC (and other C compilers too) support the widening multiplication >> of i386/AMD64 processors, but DON'T support their narrowing division: > >

Re: Widening multiplication, but no narrowing division [i386/AMD64]

2023-01-09 Thread Stefan Kanthak
"Paul Koning" wrote: >> On Jan 9, 2023, at 10:20 AM, Stefan Kanthak wrote: >> >> "Paul Koning" wrote: >> >>>> On Jan 9, 2023, at 7:20 AM, Stefan Kanthak wrote: >>>> >>>> Hi, >>>> >>>>

Re: B^HDEAD code generation (AMD64)

2023-01-09 Thread Stefan Kanthak
"Thomas Koenig" wrote: > On 09.01.23 12:35, Stefan Kanthak wrote: >> 20 superfluous instructions of the total 102 instructions! > > The proper place for bug reports is https://gcc.gnu.org/bugzilla/ . OUCH: there's NO proper place for bugs at all! > Feel fre

Will GCC eventually support SSE2 or SSE4.1?

2023-05-25 Thread Stefan Kanthak
#ret 14 instructions in 33 bytes# 11 instructions in 32 bytes OUCH: why does GCC abuse EBX (and ECX too) and performs a superfluous memory write? Stefan Kanthak

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, wrote: > >> On Thu, May 25, 2023 at 11:56?PM Stefan Kanthak >> wrote: >>> >>> Hi, >>> >>> compile the following function on a system with Core

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 09:00, Stefan Kanthak wrote: >> >> "Jonathan Wakely" wrote: >> >> > On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, wrote: >> > >> >> On Thu, May 25, 2023 at 11:56?PM S

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jakub Jelinek" wrote: > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote: >> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it. >>That's bad, REALITY CHECK, please! > > You're wrong. > SSE4.1 first appe

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jakub Jelinek" wrote: > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote: >> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it. >>That's bad, REALITY CHECK, please! > > You're wrong. > SSE4.1 first appe

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 12:29, Stefan Kanthak wrote: >> >> "Jakub Jelinek" wrote: >> >> > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote: >> >> 3) SSE4.1 is supported since Core2, but -marc

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 12:42, Stefan Kanthak wrote: >> Why does the documentation FAIL to specify that CPU features given by >> -m* override -m32 or enables them in ADDITION to those enabled by -march=? > > Because it's obvious. I

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 13:09, Stefan Kanthak wrote: >> >> "Jonathan Wakely" wrote: >> >> > On Fri, 26 May 2023 at 12:29, Stefan Kanthak >> > wrote: >> >> OUCH: as shown in https://godbolt.org/z

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 13:23, Stefan Kanthak wrote: >> >> "Jonathan Wakely" wrote: >> >> > On Fri, 26 May 2023 at 12:42, Stefan Kanthak wrote: >> >> Why does the documentation FAIL to specify that CP

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jakub Jelinek" wrote: > On Fri, May 26, 2023 at 02:19:54PM +0200, Stefan Kanthak wrote: >> > I find it very SURPRISING that you're only just learning the basics of >> > how to use gcc NOW, after YELLING about all the OUCH. >> >> I'm NOT

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 14:55, Stefan Kanthak wrote: [...] >> NOT obvious is but that -m -march= does not clear any >> not supported in , i.e the last one does NOT win here. > > The last -march option selects the base set of instructi

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jakub Jelinek" wrote: [...] > And for -m32 it is also the last option that wins, but as with > many other cases just last one from certain set of options. [...] > The -mISA options are processed left to right after as well as BEFORE > setting base from -march=. In other words: although -marc

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 15:48, Stefan Kanthak wrote: >> >> "Jakub Jelinek" wrote: >> >> [...] >> >> > And for -m32 it is also the last option that wins, but as with >> > many other cases just

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
You wrote: >在 2023-05-26 14:46, Stefan Kanthak 写道: >> OOPS: why does GCC (ab)use the SSE2 alias "Willamette New Instruction Set" >> (... ...) >> OUCH: why does it FAIL to REALLY use SSE2, as shown in the comments on the >>right side? > > Pleas

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-26 Thread Stefan Kanthak
"Jonathan Wakely" wrote: > On Fri, 26 May 2023 at 15:34, Stefan Kanthak wrote: >> >> "Jonathan Wakely" wrote: >> >> > On Fri, 26 May 2023 at 14:55, Stefan Kanthak >> > wrote: >> >> [...] >> >> >> NOT obv

GCC plays "Shell Game", but looses track of the shell covering the nought

2023-05-27 Thread Stefan Kanthak
--- demo.c --- int ispowerof2(unsigned long long argument) { return (argument != 0) && ((argument & argument - 1) == 0); } --- EOF --- GCC 12.2gcc -m32 -O3 https://gcc.godbolt.org/z/YWP4zb8jd ispowerof2(unsigned long long): pushedi# three registers clob

Re: GCC plays "Shell Game", but looses track of the shell covering the nought

2023-05-27 Thread Stefan Kanthak
tructions as 12.* Also note the difference to yesterdays demo.c: "thanks" to the added | (argument != 0) GCC does NOT generate SSE2 instructions any more. I don't know yet whether this change is a quirk or WTF, Stefan > Dave > > > On Sat, 27 May 2023 18:23:12 +0200 >

Epic code generator/optimiser failures

2023-05-27 Thread Stefan Kanthak
--- demo.c --- int ispowerof2(unsigned long long argument) { return (argument != 0) && ((argument & argument - 1) == 0); } --- EOF --- GCC 13.1gcc -m32 -mavx -O3 # or -march=native instead of -mavx https://gcc.godbolt.org/z/T31Gzo85W ispowerof2(unsigned long long): vmovq xmm1, Q

Re: Will GCC eventually support SSE2 or SSE4.1?

2023-05-27 Thread Stefan Kanthak
You wrote: >在 2023-05-26 23:40, Stefan Kanthak 写道: >> Feel free to propose this alternative here (better elsewhere, where you'll >> earn less laughter). >> But don't forget that this 23-bit mantissa will be all zeroes for quite some >> 64-bit (and even 32-

Another epic optimiser failure

2023-05-27 Thread Stefan Kanthak
--- .c --- int ispowerof2(unsigned long long argument) { return __builtin_popcountll(argument) == 1; } --- EOF --- GCC 13.3gcc -m32 -march=alderlake -O3 gcc -m32 -march=sapphirerapids -O3 gcc -m32 -mpopcnt -mtune=sapphirerapids -O3 https://gcc.godbolt.org/z/cToYrrY

Who cares about performance (or Intel's CPU errata)?

2023-05-27 Thread Stefan Kanthak
Just to show how SLOPPY, INCONSEQUENTIAL and INCOMPETENT GCC's developers are: --- dontcare.c --- int ispowerof2(unsigned __int128 argument) { return __builtin_popcountll(argument) + __builtin_popcountll(argument >> 64) == 1; } --- EOF --- GCC 13.3gcc -march=haswell -O3 https://gcc.godb

Re: Another epic optimiser failure

2023-05-27 Thread Stefan Kanthak
"Jakub Jelinek" wrote, completely clueless: > On Sat, May 27, 2023 at 11:04:11PM +0200, Stefan Kanthak wrote: >> OUCH: popcnt writes the WHOLE result register, there is ABSOLUTELY >> no need to clear it beforehand nor to clear the higher 24 bits >> aft

Re: Another epic optimiser failure

2023-05-27 Thread Stefan Kanthak
"Andrew Pinski" wrote: > On Sat, May 27, 2023 at 2:38 PM Stefan Kanthak > wrote: >> >> "Jakub Jelinek" wrote, completely clueless: >> >>> On Sat, May 27, 2023 at 11:04:11PM +0200, Stefan Kanthak wrote: >>>> OUCH: popcnt writes

Re: Who cares about performance (or Intel's CPU errata)?

2023-05-27 Thread Stefan Kanthak
"Andrew Pinski" wrote: > On Sat, May 27, 2023 at 2:25 PM Stefan Kanthak > wrote: >> >> Just to show how SLOPPY, INCONSEQUENTIAL and INCOMPETENT GCC's developers >> are: >> >> --- dontcare.c --- >> int ispowerof2(unsigned __int12

Re: Who cares about performance (or Intel's CPU errata)?

2023-05-28 Thread Stefan Kanthak
"Andrew Pinski" wrote: > On Sat, May 27, 2023 at 3:54 PM Stefan Kanthak > wrote: [...] >> Nevertheless GCC fails to optimise code properly: >> >> --- .c --- >> int ispowerof2(unsigned long long argument) { >> return __builtin_popcountll(argu

Who cares about size? (was: Who cares about performance (or Intel's CPU errata)?)

2023-05-29 Thread Stefan Kanthak
"Andrew Pinski" wrote: > On Sat, May 27, 2023 at 3:54 PM Stefan Kanthak > wrote: >> Nevertheless GCC fails to optimise code properly: >> >> --- .c --- >> int ispowerof2(unsigned long long argument) { >> return __builtin_popcountll(argument) =

Will GCC eventually learn to use BSR or even TZCNT on AMD/Intel processors?

2023-06-05 Thread Stefan Kanthak
instead of code fiddling with the stack! Stefan Kanthak