Re: Another epic optimiser failure
Man, these clang fanboys sure are getting out of hand I feel like all this garbage can be easily resolved by y'all showing this idiot the exact proper options required and attaching the resulting compiled assembly exactly as he wants it, or if gcc doesn't compile the exact assembly he wants, explaining why gcc chose a different route than the quote on quote "Perfect assembly" that he expects it to spit out And Stefan? Ever heard of the saying that "the loudest man in the room is always the weakest"?
Re: Who cares about performance (or Intel's CPU errata)?
"Andrew Pinski" wrote: > On Sat, May 27, 2023 at 3:54 PM Stefan Kanthak > wrote: [...] >> Nevertheless GCC fails to optimise code properly: >> >> --- .c --- >> int ispowerof2(unsigned long long argument) { >> return __builtin_popcountll(argument) == 1; >> } >> --- EOF --- >> >> GCC 13.3gcc -m32 -mpopcnt -O3 >> >> https://godbolt.org/z/fT7a7jP4e >> ispowerof2(unsigned long long): >> xor eax, eax >> xor edx, edx >> popcnt eax, [esp+4] >> popcnt edx, [esp+8] >> add eax, edx # eax is less than 64! >> cmp eax, 1->dec eax # 2 bytes shorter > > dec eax is done for -Os already. -O2 means performance, it does not > mean decrease size. dec can be slower as it can create a false > dependency and it requires eax register to be not alive at the end of > the statement. and IIRC for x86 decode, it could cause 2 (not 1) > micro-ops. It CAN, it COULD, but is does NOT NEED to: it all depends on the target processor. Shall I add an example with -march=? >> seteal Depending on the target processor the partial register can also harm the performance. Did you forget to mention that too? >> movzx eax, al # superfluous > > No it is not superfluous, well ok it is because of the context of eax > (besides the lower 8 bits) are already zero'd Correct. The same holds for example for PMOVMSKB when the high(er) lane(s) of the source [XYZ]MM register are (known to be) 0, for example after MOVQ; that's what GCC also fails to track. > but keeping that track is a hard problem and is turning problem really. Aren't such problems just there to be solved? > And I suspect it would cause another false dependency later on too. All these quirks can be avoided with the following 6-byte code sequence (same size as SETcc plus MOVZX) I used in one of my previous posts to fold any non-zero value to 1: negeax sbbeax, eax negeax No partial register writes, no false dependencies, no INC/DEC subleties. JFTR: AMD documents that SBB with same destination and source is handled in the register renamer; I suspect Intel processors do it too, albeit not documented. > For -Os -march=skylake (and -Oz instead of -Os) we get: >popcnt rdi, rdi >popcnt rsi, rsi >add esi, edi >xor eax, eax >dec esi >seteal > > Which is exactly what you want right? Yes. For -m32 -Os/-Oz, AND if CDQ breaks the dependency, it should be xor eax, eax xor edx, edx ->cdq # 1 byte shorter popcnt eax, [esp+4] popcnt edx, [esp+8] add eax, edx # eax is less than 64! cmp eax, 1->dec eax # 2 bytes shorter On AMD64 DEC is a 2-byte instruction; the following alternative code avoids its potential false dependency as well as other possible quirks, and also suits -Ot, -O2 and -O3 on processors where the register renamer handles the XOR: popcnt rdi, rdi popcnt rsi, rsi xor eax, eax not edi# edi = -(edi + 1) sub edi, esi # edi = -(edi + 1 + esi) setzal For processors where the register renamer doesn't "execute" XOR, but MOV, the following code is an alternative for -Ot, -O2 and -O3: popcnt rdi, rdi popcnt rsi, rsi mov eax, edi add eax, esi cmp eax, 1 setzal Stefan
Re: Will GCC eventually support correct code compilation?
On 27/05/2023 20:16, Dave Blanchard wrote: On Fri, 26 May 2023 18:44:41 +0200 David Brown via Gcc wrote: On 26/05/2023 17:49, Stefan Kanthak wrote: I don't like to argue with idiots: they beat me with experience! Stefan Stefan, you are clearly not happy about the /free/ compiler you are using, and its /free/ documentation (which, despite its flaws, is better than I have seen for most other compilers). When the flaws continue to stack up as things get provably worse over time, at some point you need to stop patting yourself on the back, riding on the coattails of your past successes, and get to work making things right. I think your idea of "proof" might differ from that of everyone else. The GCC developers are entirely aware that their tools have bugs and scope for improvement, but anyone who has followed the project for any length of time can see it has continually progressed in many ways. There are regularly minor regressions, and occasionally serious issues - but the serious issues get fixed. This is open source software. If newer versions were "getting provably worse over time", then people would simply fork earlier versions and use them. That's what happens in projects where a significant number of users or developers feel the project is moving in the wrong direction. At the very least, GCC documentation is HORRIBLE, as this previous thread proves. Now I am sure that you don't know what "proof" is. In regard to documentation, this thread proves that GCC's documentation is not perfect, that the GCC developers know this, that they ask people for suggestions for improvement, and that they keep track of suggestions or complaints so that they can be fixed when time and resources allow. If the branch is rotten and splintered then maybe it's time to get off that branch and climb onto another one. Feel free to do so. Remember, these are people with /no/ obligation to help you. ... and it often shows! My experience, like that of most people (judging from the mailing lists and the bugzilla discussions I have read), is different - those who treat the GCC developers politely and with the respect due any fellow human, get a great deal of help. They might not always agree on what should be changed, but even then you can generally come out of the discussion with an understanding of /why/ they cannot or will not change GCC as you'd like. But - like everyone else - the GCC developers can quickly lose interest in helping those who come across as rude, demanding, unhelpful and wilfully ignorant. Some do gcc development as voluntary contributions, others are paid to work on it - but they are not paid by /you/. And none are paid to sit and listen to your tantrums. So is this proof of the technical and intellectually bankruptcy of the open source development model, or...? No, it is not. If nobody wants to have detailed discussions about the technical workings of a very serious tool that millions are relying on day in and day out, what is this mailing list FOR, exactly? It /is/ for such discussions. This thread has not been a discussion - it has been driven by someone who preferred to yell and whine rather than discuss, and insisted on continuing here rather than filing bug reports in the right places. The GCC developers prefer to work /with/ the users in finding out how to make the toolchain better - /that/ is what the mailing lists are for.
Re: Who cares about performance (or Intel's CPU errata)?
On 28/05/2023 01:30, Andrew Pinski via Gcc wrote: On Sat, May 27, 2023 at 3:54 PM Stefan Kanthak wrote: seteal movzx eax, al # superfluous No it is not superfluous, well ok it is because of the context of eax (besides the lower 8 bits) are already zero'd but keeping that track is a hard problem and is turning problem really. And I suspect it would cause another false dependency later on too. For -Os -march=skylake (and -Oz instead of -Os) we get: popcnt rdi, rdi popcnt rsi, rsi add esi, edi xor eax, eax dec esi seteal Which is exactly what you want right? Thanks, Andrew There is also the option of using "bool" as the return type for boolean functions, rather than "int". When returning a "bool", gcc does not add the "movzx eax, al" instruction. (There are some circumstances where returning "int" for a boolean value is a better choice, but usually "bool" makes more sense, and it can often be a touch more efficient.) David
problem building gcc-13.1.0: error: Pthreads are required error: Pthreads are required to build libgompto build libgomp
Trying to build default target in 13.1.0 source, and am hitting a Pthreads are required error. I have the .h and lib on my system, so not sure why hitting this error. I goog'd the error and see nothing recent about why I'd get the error. Any suggestions? Please include me in response, as I'm not sure I'm getting gcc ml messages right now. Thanks!