Re: Another epic optimiser failure

2023-05-28 Thread Julian Waters via Gcc
Man, these clang fanboys sure are getting out of hand

I feel like all this garbage can be easily resolved by y'all showing this
idiot the exact proper options required and attaching the resulting
compiled assembly exactly as he wants it, or if gcc doesn't compile the
exact assembly he wants, explaining why gcc chose a different
route than the quote on quote "Perfect assembly" that he expects it to spit
out

And Stefan? Ever heard of the saying that "the loudest man in the room is
always the weakest"?


Re: Who cares about performance (or Intel's CPU errata)?

2023-05-28 Thread Stefan Kanthak
"Andrew Pinski"  wrote:

> On Sat, May 27, 2023 at 3:54 PM Stefan Kanthak  
> wrote:

[...]

>> Nevertheless GCC fails to optimise code properly:
>>
>> --- .c ---
>> int ispowerof2(unsigned long long argument) {
>> return __builtin_popcountll(argument) == 1;
>> }
>> --- EOF ---
>>
>> GCC 13.3gcc -m32 -mpopcnt -O3
>>
>> https://godbolt.org/z/fT7a7jP4e
>> ispowerof2(unsigned long long):
>> xor eax, eax
>> xor edx, edx
>> popcnt  eax, [esp+4]
>> popcnt  edx, [esp+8]
>> add eax, edx # eax is less than 64!
>> cmp eax, 1->dec eax  # 2 bytes shorter
>
> dec eax is done for -Os already. -O2 means performance, it does not
> mean decrease size. dec can be slower as it can create a false
> dependency and it requires eax register to be not alive at the end of
> the statement. and IIRC for x86 decode, it could cause 2 (not 1)
> micro-ops.

It CAN, it COULD, but is does NOT NEED to: it all depends on the target
processor. Shall I add an example with -march=?

>> seteal

Depending on the target processor the partial register can also harm
the performance.
Did you forget to mention that too?

>> movzx   eax, al  # superfluous
>
> No it is not superfluous, well ok it is because of the context of eax
> (besides the lower 8 bits) are already zero'd

Correct.
The same holds for example for PMOVMSKB when the high(er) lane(s) of
the source [XYZ]MM register are (known to be) 0, for example after MOVQ;
that's what GCC also fails to track.

> but keeping that track is a hard problem and is turning problem really.

Aren't such problems just there to be solved?

> And I suspect it would cause another false dependency later on too.

All these quirks can be avoided with the following 6-byte code sequence
(same size as SETcc plus MOVZX) I used in one of my previous posts to
fold any non-zero value to 1:

negeax
sbbeax, eax
negeax

No partial register writes, no false dependencies, no INC/DEC subleties.

JFTR: AMD documents that SBB with same destination and source is handled
  in the register renamer; I suspect Intel processors do it too,
  albeit not documented.

> For -Os -march=skylake (and -Oz instead of -Os) we get:
>popcnt  rdi, rdi
>popcnt  rsi, rsi
>add esi, edi
>xor eax, eax
>dec esi
>seteal
>
> Which is exactly what you want right?

Yes.
For -m32 -Os/-Oz, AND if CDQ breaks the dependency, it should be

 xor eax, eax
 xor edx, edx  ->cdq  # 1 byte shorter
 popcnt  eax, [esp+4]
 popcnt  edx, [esp+8]
 add eax, edx # eax is less than 64!
 cmp eax, 1->dec eax  # 2 bytes shorter

On AMD64 DEC  is a 2-byte instruction; the following alternative code
avoids its potential false dependency as well as other possible quirks,
and also suits -Ot, -O2 and -O3 on processors where the register renamer
handles the XOR:

popcnt  rdi, rdi
popcnt  rsi, rsi
xor eax, eax
not edi# edi = -(edi + 1)
sub edi, esi   # edi = -(edi + 1 + esi)
setzal

For processors where the register renamer doesn't "execute" XOR, but MOV,
the following code is an alternative for -Ot, -O2 and -O3:

popcnt  rdi, rdi
popcnt  rsi, rsi
mov eax, edi
add eax, esi
cmp eax, 1
setzal

Stefan


Re: Will GCC eventually support correct code compilation?

2023-05-28 Thread David Brown

On 27/05/2023 20:16, Dave Blanchard wrote:

On Fri, 26 May 2023 18:44:41 +0200 David Brown via Gcc
 wrote:


On 26/05/2023 17:49, Stefan Kanthak wrote:


I don't like to argue with idiots: they beat me with experience!

Stefan



Stefan, you are clearly not happy about the /free/ compiler you
are using, and its /free/ documentation (which, despite its flaws,
is better than I have seen for most other compilers).


When the flaws continue to stack up as things get provably worse over
time, at some point you need to stop patting yourself on the back,
riding on the coattails of your past successes, and get to work
making things right.



I think your idea of "proof" might differ from that of everyone else. 
The GCC developers are entirely aware that their tools have bugs and 
scope for improvement, but anyone who has followed the project for any 
length of time can see it has continually progressed in many ways. 
There are regularly minor regressions, and occasionally serious issues - 
but the serious issues get fixed.


This is open source software.  If newer versions were "getting provably 
worse over time", then people would simply fork earlier versions and use 
them.  That's what happens in projects where a significant number of 
users or developers feel the project is moving in the wrong direction.



At the very least, GCC documentation is HORRIBLE, as this previous
thread proves.


Now I am sure that you don't know what "proof" is.  In regard to 
documentation, this thread proves that GCC's documentation is not 
perfect, that the GCC developers know this, that they ask people for 
suggestions for improvement, and that they keep track of suggestions or 
complaints so that they can be fixed when time and resources allow.




If the branch is rotten and splintered then maybe it's time to get
off that branch and climb onto another one.


Feel free to do so.




Remember, these are people with /no/ obligation to help you.


... and it often shows!


My experience, like that of most people (judging from the mailing lists 
and the bugzilla discussions I have read), is different - those who 
treat the GCC developers politely and with the respect due any fellow 
human, get a great deal of help.  They might not always agree on what 
should be changed, but even then you can generally come out of the 
discussion with an understanding of /why/ they cannot or will not change 
GCC as you'd like.


But - like everyone else - the GCC developers can quickly lose interest 
in helping those who come across as rude, demanding, unhelpful and 
wilfully ignorant.





Some do gcc development as voluntary contributions, others are paid
to work on it - but they are not paid by /you/.  And none are paid
to sit and listen to your tantrums.


So is this proof of the technical and intellectually bankruptcy of
the open source development model, or...?


No, it is not.



If nobody wants to have detailed discussions about the technical
workings of a very serious tool that millions are relying on day in
and day out, what is this mailing list FOR, exactly?



It /is/ for such discussions.  This thread has not been a discussion - 
it has been driven by someone who preferred to yell and whine rather 
than discuss, and insisted on continuing here rather than filing bug 
reports in the right places.  The GCC developers prefer to work /with/ 
the users in finding out how to make the toolchain better - /that/ is 
what the mailing lists are for.





Re: Who cares about performance (or Intel's CPU errata)?

2023-05-28 Thread David Brown

On 28/05/2023 01:30, Andrew Pinski via Gcc wrote:

On Sat, May 27, 2023 at 3:54 PM Stefan Kanthak  wrote:





 seteal
 movzx   eax, al  # superfluous


No it is not superfluous, well ok it is because of the context of eax
(besides the lower 8 bits) are already zero'd but keeping that track
is a hard problem and is turning problem really. And I suspect it
would cause another false dependency later on too.

For -Os -march=skylake (and -Oz instead of -Os) we get:
 popcnt  rdi, rdi
 popcnt  rsi, rsi
 add esi, edi
 xor eax, eax
 dec esi
 seteal

Which is exactly what you want right?

Thanks,
Andrew

There is also the option of using "bool" as the return type for boolean 
functions, rather than "int".  When returning a "bool", gcc does not add 
the "movzx eax, al" instruction.  (There are some circumstances where 
returning "int" for a boolean value is a better choice, but usually 
"bool" makes more sense, and it can often be a touch more efficient.)


David




problem building gcc-13.1.0: error: Pthreads are required error: Pthreads are required to build libgompto build libgomp

2023-05-28 Thread L A Walsh

Trying to build default target in 13.1.0 source, and am hitting a
Pthreads are required error.

I have the .h and lib on my system, so not sure why hitting this error.

I goog'd the error and see nothing recent about why I'd get the error.

Any suggestions?

Please include me in response, as I'm not sure I'm getting gcc ml messages
right now.

Thanks!