Re: Question about generating vpmovzxbd instruction without using the interfaces in immintrin.h

2024-05-31 Thread Richard Biener via Gcc
On Fri, May 31, 2024 at 4:59 AM Hanke Zhang via Gcc  wrote:
>
> Hi,
> I've recently been trying to hand-write code to trigger automatic
> vectorization optimizations in GCC on Intel x86 machines (without
> using the interfaces in immintrin.h), but I'm running into a problem
> where I can't seem to get the concise `vpmovzxbd` or similar
> instructions.
>
> My requirement is to convert 8 `uint8_t` elements to `int32_t` type
> and print the output. If I use the interface (_mm256_cvtepu8_epi32) in
> immintrin.h, the code is as follows:
>
> int immintrin () {
> int size = 1, offset = 3;
> uint8_t* a = malloc(sizeof(char) * size);
>
> __v8si b = (__v8si)_mm256_cvtepu8_epi32(*(__m128i *)(a + offset));
>
> for (int i = 0; i < 8; i++) {
> printf("%d\n", b[i]);
> }
> }
>
> After compiling with -mavx2 -O3, you can get concise and efficient
> instructions. (You can see it here: https://godbolt.org/z/8ojzdav47)
>
> But if I do not use this interface and instead use a for-loop or the
> `__builtin_convertvector` interface provided by GCC, I cannot achieve
> the above effect. The code is as follows:
>
> typedef uint8_t v8qiu __attribute__ ((__vector_size__ (8)));
> int forloop () {
> int size = 1, offset = 3;
> uint8_t* a = malloc(sizeof(char) * size);
>
> v8qiu av = *(v8qiu *)(a + offset);
> __v8si b = {};
> for (int i = 0; i < 8; i++) {
> b[i] = (a + offset)[i];
> }
>
> for (int i = 0; i < 8; i++) {
> printf("%d\n", b[i]);
> }
> }
>
> int builtin_cvt () {
> int size = 1, offset = 3;
> uint8_t* a = malloc(sizeof(char) * size);
>
> v8qiu av = *(v8qiu *)(a + offset);
> __v8si b = __builtin_convertvector(av, __v8si);
>
> for (int i = 0; i < 8; i++) {
> printf("%d\n", b[i]);
> }
> }

Ideally both should work.  The loop case works when disabling
the loop vectorizer, thus -O3 -fno-tree-loop-vectorize it then
produces

vpmovzxbd   3(%rax), %ymm0
vmovdqa %ymm0, (%rsp)

the loop vectorizer is constraint with using same vector sizes
and thus makes a mess out of it by unpacking the 8 char
vector two times to four 2 element int vectors.

I do have plans to address this, but not sure if those can materialize
for GCC 15.

> The instructions generated by both functions are redundant and
> complex, and are quite difficult to read compared to calling
> `_mm256_cvtepu8_epi32` directly. (You can see it here as well:
> https://godbolt.org/z/8ojzdav47)
>
> What I want to ask is: How should I write the source code to get
> assembly instructions similar to directly calling
> _mm256_cvtepu8_epi32?
>
> Or would it be easier if I modified the GIMPLE directly? But it seems
> that there is no relevant expression or interface directly
> corresponding to `vpmovzxbd` in GIMPLE.
>
> Thanks
> Hanke Zhang


Bytv h v

2024-05-31 Thread Robert Enriquez via Gcc



Sent from my iPhone


答复: Re: Is fcommon related with performance optimization logic?

2024-05-31 Thread Zhaohaifeng(Clark,CIS-HCE) via Gcc
Thanks.

the UnixBench source code is as following:

unsigned long Run_Index;
Rec_Pointer Ptr_Glob,
Next_Ptr_Glob;
int Int_Glob;
Boolean Bool_Glob;
char Ch_1_Glob,
Ch_2_Glob;
int Arr_1_Glob [50];
int Arr_2_Glob [50] [50];
Boolean Reg = true;
long Begin_Time,
End_Time,
User_Time;
float Microseconds,
Dhrystones_Per_Second;

Some key results are as following :

1.   Using gcc 10.3 the variables are arranged from the last 
Dhrystone_Per_Second to the first Ptr_Glob, both in assembly and the final 
binary.
0x004040c0   0x0008   B  stderr@GLIBC_2.2.5
0x004040c8   0x0001   b  completed.0
0x004040e0   0x0004   B  Dhrystones_Per_Second
0x004040e4   0x0004   B  Microseconds
0x004040e8   0x0008   B  User_Time
0x004040f0   0x0008   B  End_Time
0x004040f8   0x0008   B  Begin_Time
0x00404100   0x0004   B  Reg
0x00404120   0x2710   B  Arr_2_Glob
0x00406840   0x00c8   B  Arr_1_Glob
0x00406908   0x0001   B  Ch_2_Glob
0x00406909   0x0001   B  Ch_1_Glob
0x0040690c   0x0004   B  Bool_Glob
0x00406910   0x0004   B  Int_Glob
0x00406918   0x0008   B  Next_Ptr_Glob
0x00406920   0x0008   B  Ptr_Glob
0x00406928   0x0008   B  Run_Index

If we change the sequence of the variables in the source code, the sequence in 
assembly and binary is also changed as the same logic, using gcc 10.3.


2.   Using gcc 8.5 the variables are arranged as following both in assembly 
and final binary,
0x004040c0   0x0008   B  stderr@GLIBC_2.2.5
0x004040c8   0x0001   b  completed.0
0x004040e0   0x0008   B  Begin_Time
0x00404100   0x2710   B  Arr_2_Glob
0x00406810   0x0001   B  Ch_2_Glob
0x00406818   0x0008   B  Run_Index
0x00406820   0x0004   B  Microseconds
0x00406828   0x0008   B  Ptr_Glob
0x00406830   0x0004   B  Dhrystones_Per_Second
0x00406838   0x0008   B  End_Time
0x00406840   0x0004   B  Int_Glob
0x00406844   0x0004   B  Bool_Glob
0x00406848   0x0008   B  User_Time
0x00406850   0x0008   B  Next_Ptr_Glob
0x00406860   0x00c8   B  Arr_1_Glob
0x00406928   0x0001   B  Ch_1_Glob

If the variable sequence is changed in the source code, the sequence in 
assembly and binary is NOT changed using gcc 8.5.
So we can see that the assembling process take effect and fcommon will arrange 
the variables following some special logic.


3.   If we make some change to the source code, by adding some int arrays 
between the variables, the performance of using gcc 10.3 is similar as gcc 8.5. 
So it can be infered that variable caching process is changed in this case 
which has great impact in this problem.

So it is the problem that whether the fcommon has some expected performance 
optimization logic. If not, maybe it is just some random performance result. 
But the variable arrangement reveals that it has some special logic.

Best regards,
Clark Zhao

This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!

发件人: 赵海峰 [mailto:zju@qq.com]
发送时间: 2024年5月31日 16:27
收件人: Zhaohaifeng(Clark,CIS-HCE) 
主题: Fw: Re: Is fcommon related with performance optimization logic?



---Original---
From: "Andrew Pinski"mailto:pins...@gmail.com>>
Date: Thu, May 30, 2024 10:27 AM
To: "赵海峰"mailto:zju@qq.com>>;
Cc: "gcc"mailto:gcc@gcc.gnu.org>>;
Subject: Re: Is fcommon related with performance optimization logic?

On Wed, May 29, 2024 at 7:13 PM 赵海峰 via Gcc wrote:
>
> Dear Sir/Madam,
>
>
> We found that running on intel SPR UnixBench compiled with gcc 10.3 performs 
> worse than with gcc 8.5 for dhry2reg benchmark.
>
>
> I found it related with -fcommon option which is disabled in 10.3 by default. 
> Fcommon will make global variables addresses in special order in bss 
> section(watching by nm -n) whatever they are defined in source code.
>
>
> We are wondering if fcommon has some special perform

答复: Re: Is fcommon related with performance optimization logic?

2024-05-31 Thread Zhaohaifeng(Clark,CIS-HCE) via Gcc
Sorry to use another e-mail due to network issue.

I tried -fsection-anchors option. But it does not apply to the target.

Best regards
Clark Zhao

This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!

发件人: 赵海峰 [mailto:zju@qq.com]
发送时间: 2024年5月31日 16:51
收件人: Zhaohaifeng(Clark,CIS-HCE) 
主题: Fw: Re: Is fcommon related with performance optimization logic?



---Original---
From: "David Brown"mailto:david.br...@hesbynett.no>>
Date: Thu, May 30, 2024 22:19 PM
To: "Andrew 
Pinski"mailto:pins...@gmail.com>>;"赵海峰"mailto:zju@qq.com>>;
Cc: "gcc"mailto:gcc@gcc.gnu.org>>;
Subject: Re: Is fcommon related with performance optimization logic?

On 30/05/2024 04:26, Andrew Pinski via Gcc wrote:
> On Wed, May 29, 2024 at 7:13 PM 赵海峰 via Gcc wrote:
>>
>> Dear Sir/Madam,
>>
>>
>> We found that running on intel SPR UnixBench compiled with gcc 10.3 performs 
>> worse than with gcc 8.5 for dhry2reg benchmark.
>>
>>
>> I found it related with -fcommon option which is disabled in 10.3 by 
>> default. Fcommon will make global variables addresses in special order in 
>> bss section(watching by nm -n) whatever they are defined in source code.
>>
>>
>> We are wondering if fcommon has some special performance optimization 
>> process?
>>
>>
>> (I also post the subject to gcc-help. Hope to get some suggestion in this 
>> mail list. Sorry for bothering.)
>
> This was already filed as
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532 . But someone
> needs to go in and do more analysis of what is going wrong. The
> biggest difference for x86_64 is how the variables are laid out and by
> who (the compiler or the linker). There is some notion that
> -fno-common increases the number of L1-dcache-load-misses and that
> points to the layout of the variable differences causing the
> difference. But nobody has gone and seen which variables are laid out
> differently and why. I am suspecting that small changes in the
> code/variables would cause layout differences which will cause the
> cache misses which can cause the performance which is almost all by
> accident.
> I suspect adding -fdata-sections will cause another performance
> difference here too. And there is not much GCC can do about this since
> data layout is "hard" to do to get the best performance always.
>

(I am most familiar with embedded systems with static linking, rather
than dealing with GOT and other aspects of linking on big systems.)

I think -fno-common should allow -fsection-anchors to do a much better
job. If symbols are put in the common section, the compiler does not
know their relative position until link time. But if they are in bss or
data sections (with or without -fdata-sections), it can at least use
anchors to access data in the translation unit that defines the data
objects.

David


> Thanks,
> Andrew Pinski
>
>>
>>
>> Best regards.
>>
>>
>> Clark Zhao
>


How to avoid some built-in expansions in gcc?

2024-05-31 Thread Georg-Johann Lay

What's the recommended way to stop built-in expansions in gcc?

For example, avr-gcc expands isinff() to a bloated version of an 
isinff() implementation that's written in asm (PR115307).


Johann


Re: How to avoid some built-in expansions in gcc?

2024-05-31 Thread Jonathan Wakely via Gcc
On Fri, 31 May 2024 at 14:52, Georg-Johann Lay  wrote:
>
> What's the recommended way to stop built-in expansions in gcc?
>
> For example, avr-gcc expands isinff() to a bloated version of an
> isinff() implementation that's written in asm (PR115307).

Did you try -fno-builtin-isinff ?


Re: How to avoid some built-in expansions in gcc?

2024-05-31 Thread Georg-Johann Lay




Am 31.05.24 um 15:56 schrieb Jonathan Wakely:

On Fri, 31 May 2024 at 14:52, Georg-Johann Lay  wrote:


What's the recommended way to stop built-in expansions in gcc?

For example, avr-gcc expands isinff() to a bloated version of an
isinff() implementation that's written in asm (PR115307).


Did you try -fno-builtin-isinff ?


Are you saying that setting that option in, say, 
gcc/common/config/avr/avr-common.cc is the way to go?


Johann



Re: How to avoid some built-in expansions in gcc?

2024-05-31 Thread Jonathan Wakely via Gcc
On Fri, 31 May 2024 at 15:53, Georg-Johann Lay  wrote:
>
>
>
> Am 31.05.24 um 15:56 schrieb Jonathan Wakely:
> > On Fri, 31 May 2024 at 14:52, Georg-Johann Lay  wrote:
> >>
> >> What's the recommended way to stop built-in expansions in gcc?
> >>
> >> For example, avr-gcc expands isinff() to a bloated version of an
> >> isinff() implementation that's written in asm (PR115307).
> >
> > Did you try -fno-builtin-isinff ?
>
> Are you saying that setting that option in, say,
> gcc/common/config/avr/avr-common.cc is the way to go?

Ah, I didn't realise you meant permanently disable them, by default,
not just for a given compilation.


Re: How to avoid some built-in expansions in gcc?

2024-05-31 Thread Paul Koning via Gcc



> On May 31, 2024, at 9:52 AM, Georg-Johann Lay  wrote:
> 
> What's the recommended way to stop built-in expansions in gcc?
> 
> For example, avr-gcc expands isinff() to a bloated version of an isinff() 
> implementation that's written in asm (PR115307).
> 
> Johann

Isn't that up to the target back end?  It should define the optimization rules, 
and those should allow it to bias towards small code rather than fast big code.

paul



Re: How to avoid some built-in expansions in gcc?

2024-05-31 Thread Georg-Johann Lay




Am 31.05.24 um 17:00 schrieb Paul Koning:




On May 31, 2024, at 9:52 AM, Georg-Johann Lay  wrote:

What's the recommended way to stop built-in expansions in gcc?

For example, avr-gcc expands isinff() to a bloated version of an isinff() 
implementation that's written in asm (PR115307).

Johann


Isn't that up to the target back end?

paul



Yes, that's the reason why it's a target PR.

My question is where/how to do it.

It's clear that twiddling the options works and is a simple and 
comprehensible solution, but it seems a bit of a hack to me.


Johann



Re: How to avoid some built-in expansions in gcc?

2024-05-31 Thread Paul Koning via Gcc



> On May 31, 2024, at 11:06 AM, Georg-Johann Lay  wrote:
> 
> 
> 
> Am 31.05.24 um 17:00 schrieb Paul Koning:
>>> On May 31, 2024, at 9:52 AM, Georg-Johann Lay  wrote:
>>> 
>>> What's the recommended way to stop built-in expansions in gcc?
>>> 
>>> For example, avr-gcc expands isinff() to a bloated version of an isinff() 
>>> implementation that's written in asm (PR115307).
>>> 
>>> Johann
>> Isn't that up to the target back end?
>>  paul
> 
> 
> Yes, that's the reason why it's a target PR.
> 
> My question is where/how to do it.
> 
> It's clear that twiddling the options works and is a simple and 
> comprehensible solution, but it seems a bit of a hack to me.
> 
> Johann

I haven't dug deep into this, but I would think at least part of the answer is 
in the target cost functions.  If those assign RTX cost according to size, then 
the result would be the optimizer would favor smaller code.  Right?

Does inline assembly expansion of builtins depend on target code supplying that 
expansion?  If so, the answer would be not to supply it, or at least not unless 
asked for by an option.  If it comes from common code, that's a different 
matter, then perhaps there should be target hooks to let the target disallow or 
discourage such expansion.  I might want such a thing for pdp11 as well.

paul



Re: How to avoid some built-in expansions in gcc?

2024-05-31 Thread Richard Biener via Gcc



> Am 31.05.2024 um 17:25 schrieb Paul Koning via Gcc :
> 
> 
> 
>> On May 31, 2024, at 11:06 AM, Georg-Johann Lay  wrote:
>> 
>> 
>> 
>> Am 31.05.24 um 17:00 schrieb Paul Koning:
> On May 31, 2024, at 9:52 AM, Georg-Johann Lay  wrote:
> 
> What's the recommended way to stop built-in expansions in gcc?
> 
> For example, avr-gcc expands isinff() to a bloated version of an isinff() 
> implementation that's written in asm (PR115307).
> 
> Johann
>>> Isn't that up to the target back end?
>>>paul
>> 
>> 
>> Yes, that's the reason why it's a target PR.
>> 
>> My question is where/how to do it.
>> 
>> It's clear that twiddling the options works and is a simple and 
>> comprehensible solution, but it seems a bit of a hack to me.
>> 
>> Johann
> 
> I haven't dug deep into this, but I would think at least part of the answer 
> is in the target cost functions.  If those assign RTX cost according to size, 
> then the result would be the optimizer would favor smaller code.  Right?
> 
> Does inline assembly expansion of builtins depend on target code supplying 
> that expansion?  If so, the answer would be not to supply it, or at least not 
> unless asked for by an option.  If it comes from common code, that's a 
> different matter, then perhaps there should be target hooks to let the target 
> disallow or discourage such expansion.  I might want such a thing for pdp11 
> as well.

The function in question is folded to a comparison very early if the target 
does not implement an optab for it.  After that everything is lost.  A 
workaround is to define an optab but let expansion always FAIL.

Richard 

>paul
> 


Re: How to avoid some built-in expansions in gcc?

2024-05-31 Thread Georg-Johann Lay




Am 31.05.24 um 19:32 schrieb Richard Biener:




Am 31.05.2024 um 17:25 schrieb Paul Koning via Gcc :




On May 31, 2024, at 11:06 AM, Georg-Johann Lay  wrote:



Am 31.05.24 um 17:00 schrieb Paul Koning:

On May 31, 2024, at 9:52 AM, Georg-Johann Lay  wrote:

What's the recommended way to stop built-in expansions in gcc?

For example, avr-gcc expands isinff() to a bloated version of an isinff() 
implementation that's written in asm (PR115307).

Johann

Isn't that up to the target back end?
paul



Yes, that's the reason why it's a target PR.

My question is where/how to do it.

It's clear that twiddling the options works and is a simple and comprehensible 
solution, but it seems a bit of a hack to me.

Johann


I haven't dug deep into this, but I would think at least part of the answer is 
in the target cost functions.  If those assign RTX cost according to size, then 
the result would be the optimizer would favor smaller code.  Right?

Does inline assembly expansion of builtins depend on target code supplying that 
expansion?  If so, the answer would be not to supply it, or at least not unless 
asked for by an option.  If it comes from common code, that's a different 
matter, then perhaps there should be target hooks to let the target disallow or 
discourage such expansion.  I might want such a thing for pdp11 as well.


The function in question is folded to a comparison very early if the target 
does not implement an optab for it.  After that everything is lost.  A 
workaround is to define an optab but let expansion always FAIL.

Richard


You have a pointer how to define a target optab? I looked into optabs 
code but found no appropriate hook.  For isinf is seems is is 
enough to provide a failing expander, but other functions like isnan 
don't have an optab entry, so there is a hook mechanism to extend optabs?


I also looked into patching options, but there is no way to hook in, or 
at least I did not find how to use targetm.handle_option or an 
appropriate place to call disable_builtin_function; it's all baken into 
the C front without any hook opportunity.


Johann




Re: How to avoid some built-in expansions in gcc?

2024-05-31 Thread Richard Biener via Gcc



> Am 31.05.2024 um 20:56 schrieb Georg-Johann Lay :
> 
> 
> 
> Am 31.05.24 um 19:32 schrieb Richard Biener:
 Am 31.05.2024 um 17:25 schrieb Paul Koning via Gcc :
>>> 
>>> 
>>> 
 On May 31, 2024, at 11:06 AM, Georg-Johann Lay  wrote:
 
 
 
 Am 31.05.24 um 17:00 schrieb Paul Koning:
>>> On May 31, 2024, at 9:52 AM, Georg-Johann Lay  wrote:
>>> 
>>> What's the recommended way to stop built-in expansions in gcc?
>>> 
>>> For example, avr-gcc expands isinff() to a bloated version of an 
>>> isinff() implementation that's written in asm (PR115307).
>>> 
>>> Johann
> Isn't that up to the target back end?
>paul
 
 
 Yes, that's the reason why it's a target PR.
 
 My question is where/how to do it.
 
 It's clear that twiddling the options works and is a simple and 
 comprehensible solution, but it seems a bit of a hack to me.
 
 Johann
>>> 
>>> I haven't dug deep into this, but I would think at least part of the answer 
>>> is in the target cost functions.  If those assign RTX cost according to 
>>> size, then the result would be the optimizer would favor smaller code.  
>>> Right?
>>> 
>>> Does inline assembly expansion of builtins depend on target code supplying 
>>> that expansion?  If so, the answer would be not to supply it, or at least 
>>> not unless asked for by an option.  If it comes from common code, that's a 
>>> different matter, then perhaps there should be target hooks to let the 
>>> target disallow or discourage such expansion.  I might want such a thing 
>>> for pdp11 as well.
>> The function in question is folded to a comparison very early if the target 
>> does not implement an optab for it.  After that everything is lost.  A 
>> workaround is to define an optab but let expansion always FAIL.
>> Richard
> 
> You have a pointer how to define a target optab? I looked into optabs code 
> but found no appropriate hook.  For isinf is seems is is enough to 
> provide a failing expander, but other functions like isnan don't have an 
> optab entry, so there is a hook mechanism to extend optabs?

Just add corresponding optabs for the missing cases (some additions are 
pending, like isnornal).  There’s no hook to prevent folding to FP compares nor 
is that guarded by other means (like availability of native FP ops).  Changing 
the guards would be another reasonable option.

Richard 

> I also looked into patching options, but there is no way to hook in, or at 
> least I did not find how to use targetm.handle_option or an appropriate place 
> to call disable_builtin_function; it's all baken into the C front without any 
> hook opportunity.
> 
> Johann
> 
> 


gcc-13-20240531 is now available

2024-05-31 Thread GCC Administrator via Gcc
Snapshot gcc-13-20240531 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/13-20240531/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 13 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-13 revision 2602b71103d5ef2ef86000cac832b31dad3dfe2b

You'll find:

 gcc-13-20240531.tar.xz   Complete GCC

  SHA256=f837bbda20f09f2c3016056d322f217dc147a3328d4e55096c9d0b0def9e71f1
  SHA1=926bc4baed75ec41fadd23cbffc7efd6ff9993cd

Diffs from 13-20240524 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-13
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.