[PATCH PR96366][AARCH64] Add support for unpacked vector sub
Hi, The test case bb-slp-20.c in the gcc testsuit will cause an ICE in the expand pass due to the lack of a pattern for subtraction of the VNx2SI mode. I think the problem has been fully discussed on PR 96366. The attached file is the patch to solve this problem. Bootstrapped and tested on aarch64-linux-gnu. Ok for trunk? Thanks, Bruce 0001-PATCH-PR96366-AARCH64-Add-support-for-unpacked-sub.patch Description: 0001-PATCH-PR96366-AARCH64-Add-support-for-unpacked-sub.patch
Re : [PATCH PR96366][AARCH64] Add support for unpacked vector sub
Thanks for the review and Commit. Regards, Bruce -邮件原件- 发件人: Richard Sandiford [mailto:richard.sandif...@arm.com] 发送时间: 2020年8月3日 23:40 收件人: bule 抄送: gcc-patches@gcc.gnu.org 主题: Re: [PATCH PR96366][AARCH64] Add support for unpacked vector sub bule writes: > Hi, > > The test case bb-slp-20.c in the gcc testsuit will cause an ICE in the expand > pass due to the lack of a pattern for subtraction of the VNx2SI mode. I think > the problem has been fully discussed on PR 96366. > > The attached file is the patch to solve this problem. Bootstrapped and tested > on aarch64-linux-gnu. Ok for trunk? Thanks, pushed to trunk. Richard
[AArch64][SVE][IPA] ICE caused by incompatibility of SRA and svst builtin-function
Hello, An Internal Compiler Error(ICE) is found in ipa-sra optimization pass when it handle the argument of internal call svst3 for SVE. The problem comes from gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st2_bf16.c in the test suit, which can be reduced to flowing code: #include #include void st2_bf16_base (svbfloat16x3_t z1, svbool_t p0, bfloat16_t *x0, intptr_t x1) { svst3 (p0, x0, z1); } Compiled with -march=armv8.2-a+sve -msve-vector-bits=256 -O2, it will result in a segment fault in IPA-SRA: > [bule@localhost gcc10_fail]$ gcc st2_bf16.i -o st2_bf16.s -S > -march=armv8.2-a+sve -msve-vector-bits=256 -O2 > during IPA pass: sra > st2_bf16.c: In function ‘st2_bf16_base’: > st2_bf16.c:10:1: internal compiler error: Segmentation fault > .. /* omit some stack info here. */ .. > 0xa34f68 call_summary::get_create(cgraph_edge*) > ../.././gcc/symbol-summary.h:642 > 0xa34f68 record_nonregister_call_use > ../.././gcc/ipa-sra.c:1613 > 0xa34f68 scan_expr_access > ../.././gcc/ipa-sra.c:1781 > .. /* omit some stack info here. */ .. > Please submit a full bug report, > with preprocessed source if appropriate. > Please include the complete backtrace with any bug report. Details can be found in PR 94398. Similar problem can be found in svst2、svst4 and other functions of this kind. This problem is cause by "record_nonregister_call_use" function trying to access the call graph edge of an internal call, .MASK_STORE_LANE, which is a NULL pointer. The reason of stepping into "record_nonregister_call_use" function is that the upper level function "scan_expr_access" considered the "svbfloat16x3_t z1" argument as a valid candidate for further optimization. A simple solution here is to disqualify the candidate at "scan_expr_access" level when the call graph edge is null, which indicates the call is either an internal call or a call with no references. For both case, the further optimization process should stop before it reference a NULL pointer. A proposed patch is attached. Any suggestions? Thanks, Bu Le ips-sra-sve-fix.patch Description: ips-sra-sve-fix.patch
答复: [AArch64][SVE][IPA] ICE caused by incompatibility of SRA and svst builtin-function
Hi, The patch is tested and works fine. It is more appropriate to handle the context by considering it as a section of assemble code. A minor question is that I think svst functions are for store operations. Why pass ISRA_CTX_LOAD to scan_expr_access rather than ISRA_CTX_STORE? Thanks, Bu Le -邮件原件- 发件人: Martin Jambor [mailto:mjam...@suse.cz] 发送时间: 2020年4月7日 7:21 收件人: bule 抄送: Richard Biener ; gcc-patches@gcc.gnu.org 主题: Re: [AArch64][SVE][IPA] ICE caused by incompatibility of SRA and svst builtin-function Hi, On Thu, Apr 02 2020, Richard Biener wrote: > On Thu, Apr 2, 2020 at 5:36 AM bule wrote: >> >> Hello, >> >> An Internal Compiler Error(ICE) is found in ipa-sra optimization pass when >> it handle the argument of internal call svst3 for SVE. >> >> The problem comes from >> gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st2_bf16.c in the test suit, >> which can be reduced to flowing code: >> >> #include >> #include >> void st2_bf16_base (svbfloat16x3_t z1, svbool_t p0, bfloat16_t *x0, intptr_t >> x1) { >> svst3 (p0, x0, z1); >> } >> >> Compiled with -march=armv8.2-a+sve -msve-vector-bits=256 -O2, it will result >> in a segment fault in IPA-SRA: >> >> > [bule@localhost gcc10_fail]$ gcc st2_bf16.i -o st2_bf16.s -S >> > -march=armv8.2-a+sve -msve-vector-bits=256 -O2 during IPA pass: sra >> > st2_bf16.c: In function ‘st2_bf16_base’: >> > st2_bf16.c:10:1: internal compiler error: Segmentation fault >> > .. /* omit some stack info here. */ .. >> > 0xa34f68 call_summary::get_create(cgraph_edge*) >> > ../.././gcc/symbol-summary.h:642 >> > 0xa34f68 record_nonregister_call_use >> > ../.././gcc/ipa-sra.c:1613 >> > 0xa34f68 scan_expr_access >> > ../.././gcc/ipa-sra.c:1781 >> > .. /* omit some stack info here. */ .. >> > Please submit a full bug report, >> > with preprocessed source if appropriate. >> > Please include the complete backtrace with any bug report. >> >> Details can be found in PR 94398. >> Similar problem can be found in svst2、svst4 and other functions of this kind. >> >> This problem is cause by "record_nonregister_call_use" function trying to >> access the call graph edge of an internal call, .MASK_STORE_LANE, which is a >> NULL pointer. >> >> The reason of stepping into "record_nonregister_call_use" function is that >> the upper level function "scan_expr_access" considered the "svbfloat16x3_t >> z1" >> argument as a valid candidate for further optimization. >> >> A simple solution here is to disqualify the candidate at "scan_expr_access" >> level when the call graph edge is null, which indicates the call is either >> an internal call or a call with no references. For both case, the further >> optimization process should stop before it reference a NULL pointer. >> >> A proposed patch is attached. >> >> Any suggestions? > > I think internal calls should be handled like asms which means, lookig > at the source a bit, instead of ISRA_CTX_ARG pass ISRA_CTX_LOAD to > scan_expr_access. > indeed, in this situation it would be best if we simply treated such arguments as loads (from the aggregates). Would the following (only very mildly tested) patch work for you? Thanks, Martin diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c index f0ebaec708d..b225af61427 100644 --- a/gcc/ipa-sra.c +++ b/gcc/ipa-sra.c @@ -1870,15 +1870,22 @@ scan_function (cgraph_node *node, struct function *fun) case GIMPLE_CALL: { unsigned argument_count = gimple_call_num_args (stmt); - scan_call_info call_info; + isra_scan_context ctx = ISRA_CTX_ARG; + scan_call_info call_info, *call_info_p = &call_info; call_info.cs = node->get_edge (stmt); - call_info.argument_count = argument_count; + if (!call_info.cs) + { + call_info_p = NULL; + ctx = ISRA_CTX_LOAD; + } + else + call_info.argument_count = argument_count; for (unsigned i = 0; i < argument_count; i++) { call_info.arg_idx = i; scan_expr_access (gimple_call_arg (stmt, i), stmt, - ISRA_CTX_ARG, bb, &call_info); + ctx, bb, call_info_p); } tree lhs = gimple_call_lhs (stmt);
Discussion about the medium code model in aarch64
Hi, I reported a PR in gcc Bugzilla about the medium code model in aarch64. A solution is proposed and some discussion has been posted. The details of the discussion can be found here : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95285 Wilco suggest me to make a PIC 48-bit code model by making a new relocation type "high32_47" combined with ADRP instruction, which I think is feasible and more efficient than my solution. But this kind of relocation hasn't been defined in arm's ABI. Meanwhile he also doubt the necessity of the medium or large-pic code model. My solution, on the other hand, only use exiting relocation types R__MOVW_PREL_G0-3, which is also how llvm solve similar problems. Although it is less efficient, but currently more easier to implement. For the necessity concern, because I need to optimize CESM in my work, I happened need to use this kind of large-pic code model. The abstracted test case is also provided in the bug report. I would very much like to know what is your opinion on this issue. Which solution you think is more appropriate for current situation? And regarding the necessity problem, I admit it is not a critical issue. But some application in HPC field do need this code model. Personally, I think it doesn't hurt for us to upstream a prototype first for customer to use it. Later if arm have an official document regarding this code model, we can then make a standard model. What's you opinion regarding this necessity problem? Thanks a lot. Regards, Bu Le (Bruce)
PING: Discussion about the medium code model in aarch64
Hi, I reported a PR in gcc Bugzilla about the medium code model in aarch64. A solution is proposed and some discussion has been posted. The details of the discussion can be found here : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95285 Wilco suggest me to make a PIC 48-bit code model by making a new relocation type "high32_47" combined with ADRP instruction, which I think is feasible and more efficient than my solution. But this kind of relocation hasn't been defined in arm's ABI. Meanwhile he also doubt the necessity of the medium or large-pic code model. My solution, on the other hand, only use exiting relocation types R__MOVW_PREL_G0-3, which is also how llvm solve similar problems. Although it is less efficient, but currently more easier to implement. For the necessity concern, because I need to optimize CESM in my work, I happened need to use this kind of large-pic code model. The abstracted test case is also provided in the bug report. I would very much like to know what is your opinion on this issue. Which solution you think is more appropriate for current situation? And regarding the necessity problem, I admit it is not a critical issue. But some application in HPC field do need this code model. Personally, I think it doesn't hurt for us to upstream a prototype first for customer to use it. Later if arm have an official document regarding this code model, we can then make a standard model. What's you opinion regarding this necessity problem? Thanks a lot. Regards, Bu Le (Bruce)