[PATCH PR96366][AARCH64] Add support for unpacked vector sub

2020-08-03 Thread bule
Hi, 

The test case bb-slp-20.c in the gcc testsuit will cause an ICE in the expand 
pass due to the lack of a pattern for subtraction of the VNx2SI mode. I think 
the problem has been fully discussed on PR 96366.

The attached file is the patch to solve this problem. Bootstrapped and tested 
on aarch64-linux-gnu. Ok for trunk?

Thanks,
Bruce




0001-PATCH-PR96366-AARCH64-Add-support-for-unpacked-sub.patch
Description: 0001-PATCH-PR96366-AARCH64-Add-support-for-unpacked-sub.patch


Re : [PATCH PR96366][AARCH64] Add support for unpacked vector sub

2020-08-03 Thread bule
Thanks for the review and Commit.

Regards,
Bruce

-邮件原件-
发件人: Richard Sandiford [mailto:richard.sandif...@arm.com] 
发送时间: 2020年8月3日 23:40
收件人: bule 
抄送: gcc-patches@gcc.gnu.org
主题: Re: [PATCH PR96366][AARCH64] Add support for unpacked vector sub

bule  writes:
> Hi, 
>
> The test case bb-slp-20.c in the gcc testsuit will cause an ICE in the expand 
> pass due to the lack of a pattern for subtraction of the VNx2SI mode. I think 
> the problem has been fully discussed on PR 96366.
>
> The attached file is the patch to solve this problem. Bootstrapped and tested 
> on aarch64-linux-gnu. Ok for trunk?

Thanks, pushed to trunk.

Richard


[AArch64][SVE][IPA] ICE caused by incompatibility of SRA and svst builtin-function

2020-04-01 Thread bule
Hello,

An Internal Compiler Error(ICE) is found in ipa-sra optimization pass when it 
handle the argument of internal call svst3 for SVE.

The problem comes from gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st2_bf16.c 
in the test suit, which can be reduced to flowing code:

#include 
#include
void st2_bf16_base (svbfloat16x3_t z1, svbool_t p0, bfloat16_t *x0, intptr_t 
x1) { 
svst3 (p0, x0, z1);
}

Compiled with -march=armv8.2-a+sve -msve-vector-bits=256 -O2, it will result in 
a segment fault in IPA-SRA: 

> [bule@localhost gcc10_fail]$ gcc st2_bf16.i -o st2_bf16.s -S 
> -march=armv8.2-a+sve -msve-vector-bits=256 -O2 
> during IPA pass: sra
> st2_bf16.c: In function ‘st2_bf16_base’:
> st2_bf16.c:10:1: internal compiler error: Segmentation fault
>   .. /* omit some stack info here.  */ ..
> 0xa34f68 call_summary::get_create(cgraph_edge*)
> ../.././gcc/symbol-summary.h:642
> 0xa34f68 record_nonregister_call_use
> ../.././gcc/ipa-sra.c:1613
> 0xa34f68 scan_expr_access
> ../.././gcc/ipa-sra.c:1781
>   .. /* omit some stack info here.  */ ..
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.

Details can be found in PR 94398.
Similar problem can be found in svst2、svst4 and other functions of this kind.

This problem is cause by "record_nonregister_call_use" function trying to 
access the call graph edge of an internal call, .MASK_STORE_LANE, which is a 
NULL pointer.

The reason of stepping into "record_nonregister_call_use" function is that the 
upper level function "scan_expr_access" considered the "svbfloat16x3_t z1"
argument as a valid candidate for further optimization.

A simple solution here is to disqualify the candidate at "scan_expr_access" 
level when the call graph edge is null, which indicates the call is either an 
internal call or a call with no references. For both case, the further 
optimization process should stop before it reference a NULL pointer.

A proposed patch is attached.

Any suggestions?

Thanks,
Bu Le


ips-sra-sve-fix.patch
Description: ips-sra-sve-fix.patch


答复: [AArch64][SVE][IPA] ICE caused by incompatibility of SRA and svst builtin-function

2020-04-06 Thread bule
Hi, 

The patch is tested and works fine. It is more appropriate to handle the 
context by considering it as a section of assemble code.

A minor question is that I think svst functions are for store operations. Why 
pass ISRA_CTX_LOAD to scan_expr_access rather than ISRA_CTX_STORE?

Thanks,
Bu Le

-邮件原件-
发件人: Martin Jambor [mailto:mjam...@suse.cz] 
发送时间: 2020年4月7日 7:21
收件人: bule 
抄送: Richard Biener ; gcc-patches@gcc.gnu.org
主题: Re: [AArch64][SVE][IPA] ICE caused by incompatibility of SRA and svst 
builtin-function

Hi,

On Thu, Apr 02 2020, Richard Biener wrote:
> On Thu, Apr 2, 2020 at 5:36 AM bule  wrote:
>>
>> Hello,
>>
>> An Internal Compiler Error(ICE) is found in ipa-sra optimization pass when 
>> it handle the argument of internal call svst3 for SVE.
>>
>> The problem comes from 
>> gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st2_bf16.c in the test suit, 
>> which can be reduced to flowing code:
>>
>> #include 
>> #include
>> void st2_bf16_base (svbfloat16x3_t z1, svbool_t p0, bfloat16_t *x0, intptr_t 
>> x1) {
>> svst3 (p0, x0, z1);
>> }
>>
>> Compiled with -march=armv8.2-a+sve -msve-vector-bits=256 -O2, it will result 
>> in a segment fault in IPA-SRA:
>>
>> > [bule@localhost gcc10_fail]$ gcc st2_bf16.i -o st2_bf16.s -S 
>> > -march=armv8.2-a+sve -msve-vector-bits=256 -O2 during IPA pass: sra
>> > st2_bf16.c: In function ‘st2_bf16_base’:
>> > st2_bf16.c:10:1: internal compiler error: Segmentation fault
>> >   .. /* omit some stack info here.  */ ..
>> > 0xa34f68 call_summary::get_create(cgraph_edge*)
>> > ../.././gcc/symbol-summary.h:642
>> > 0xa34f68 record_nonregister_call_use
>> > ../.././gcc/ipa-sra.c:1613
>> > 0xa34f68 scan_expr_access
>> > ../.././gcc/ipa-sra.c:1781
>> >   .. /* omit some stack info here.  */ ..
>> > Please submit a full bug report,
>> > with preprocessed source if appropriate.
>> > Please include the complete backtrace with any bug report.
>>
>> Details can be found in PR 94398.
>> Similar problem can be found in svst2、svst4 and other functions of this kind.
>>
>> This problem is cause by "record_nonregister_call_use" function trying to 
>> access the call graph edge of an internal call, .MASK_STORE_LANE, which is a 
>> NULL pointer.
>>
>> The reason of stepping into "record_nonregister_call_use" function is that 
>> the upper level function "scan_expr_access" considered the "svbfloat16x3_t 
>> z1"
>> argument as a valid candidate for further optimization.
>>
>> A simple solution here is to disqualify the candidate at "scan_expr_access" 
>> level when the call graph edge is null, which indicates the call is either 
>> an internal call or a call with no references. For both case, the further 
>> optimization process should stop before it reference a NULL pointer.
>>
>> A proposed patch is attached.
>>
>> Any suggestions?
>
> I think internal calls should be handled like asms which means, lookig 
> at the source a bit, instead of ISRA_CTX_ARG pass ISRA_CTX_LOAD to 
> scan_expr_access.
>

indeed, in this situation it would be best if we simply treated such arguments 
as loads (from the aggregates).  Would the following (only very mildly tested) 
patch work for you?

Thanks,

Martin

diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c index f0ebaec708d..b225af61427 100644
--- a/gcc/ipa-sra.c
+++ b/gcc/ipa-sra.c
@@ -1870,15 +1870,22 @@ scan_function (cgraph_node *node, struct function *fun)
case GIMPLE_CALL:
  {
unsigned argument_count = gimple_call_num_args (stmt);
-   scan_call_info call_info;
+   isra_scan_context ctx = ISRA_CTX_ARG;
+   scan_call_info call_info, *call_info_p = &call_info;
call_info.cs = node->get_edge (stmt);
-   call_info.argument_count = argument_count;
+   if (!call_info.cs)
+ {
+   call_info_p = NULL;
+   ctx = ISRA_CTX_LOAD;
+ }
+   else
+ call_info.argument_count = argument_count;
 
for (unsigned i = 0; i < argument_count; i++)
  {
call_info.arg_idx = i;
scan_expr_access (gimple_call_arg (stmt, i), stmt,
- ISRA_CTX_ARG, bb, &call_info);
+ ctx, bb, call_info_p);
  }
 
tree lhs = gimple_call_lhs (stmt);



Discussion about the medium code model in aarch64

2020-06-01 Thread bule
Hi,

I reported a PR in gcc Bugzilla about the medium code model in aarch64. A 
solution is proposed and some discussion has been posted.

The details of the discussion can be found here : 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95285

Wilco suggest me to make a PIC 48-bit code model by making a new relocation 
type "high32_47" combined with ADRP instruction,  which I think is feasible and 
more efficient than my solution. But this kind of relocation hasn't been 
defined in arm's ABI. Meanwhile he also doubt the necessity of the medium or 
large-pic code model.

My solution, on the other hand, only use exiting relocation types 
R__MOVW_PREL_G0-3, which is also how llvm solve similar problems. Although 
it is less efficient, but currently more easier to implement. For the necessity 
concern, because I need to optimize CESM in my work, I happened need to use 
this kind of large-pic code model. The abstracted test case is also provided in 
the bug report.

I would very much like to know what is your opinion on this issue.

Which solution you think is more appropriate for current situation? 

And regarding the necessity problem, I admit it is not a critical issue. But 
some application in HPC field do need this code model. 
Personally, I think it doesn't hurt for us to upstream a prototype first for 
customer to use it. Later if arm have an official document regarding this code 
model, we can then make a standard model.
What's you opinion regarding this necessity problem?

Thanks a lot.

Regards,
Bu Le (Bruce)




PING: Discussion about the medium code model in aarch64

2020-06-12 Thread bule
Hi,

I reported a PR in gcc Bugzilla about the medium code model in aarch64. A 
solution is proposed and some discussion has been posted.

The details of the discussion can be found here : 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95285

Wilco suggest me to make a PIC 48-bit code model by making a new relocation 
type "high32_47" combined with ADRP instruction,  which I think is feasible and 
more efficient than my solution. But this kind of relocation hasn't been 
defined in arm's ABI. Meanwhile he also doubt the necessity of the medium or 
large-pic code model.

My solution, on the other hand, only use exiting relocation types 
R__MOVW_PREL_G0-3, which is also how llvm solve similar problems. Although 
it is less efficient, but currently more easier to implement. For the necessity 
concern, because I need to optimize CESM in my work, I happened need to use 
this kind of large-pic code model. The abstracted test case is also provided in 
the bug report.

I would very much like to know what is your opinion on this issue.

Which solution you think is more appropriate for current situation? 

And regarding the necessity problem, I admit it is not a critical issue. But 
some application in HPC field do need this code model. 
Personally, I think it doesn't hurt for us to upstream a prototype first for 
customer to use it. Later if arm have an official document regarding this code 
model, we can then make a standard model.
What's you opinion regarding this necessity problem?

Thanks a lot.

Regards,
Bu Le (Bruce)