Errors when invoking refs_may_alias_p_1
Hi all, I have instrumented a function call like foo(&a,&b) into the gimple SSA representation (gcc-4.5) and the consequent optimizations can not pass my instrumented code. The back traces are as followings. The error occurred when the pass dse tried to test if the call I inserted may use a memory reference. It is because the arguments &a is not a SSA_VAR or INDIRECT_REF, so the assert in function bool refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p) gcc_assert ((!ref1->ref || SSA_VAR_P (ref1->ref) || handled_component_p (ref1->ref) || INDIRECT_REF_P (ref1->ref) || TREE_CODE (ref1->ref) == TARGET_MEM_REF || TREE_CODE (ref1->ref) == CONST_DECL) && (!ref2->ref || SSA_VAR_P (ref2->ref) || handled_component_p (ref2->ref) || INDIRECT_REF_P (ref2->ref) || TREE_CODE (ref2->ref) == TARGET_MEM_REF || TREE_CODE (ref2->ref) == CONST_DECL)); was violated. Does anyone know why the function arguments must be a SSA_VAR or INDIRECT_REF here? Have I missed to perform any actions to maintain the consistency of Gimple SSA? #0 0x76fc8ee0 in exit () from /lib/libc.so.6 #1 0x005ae4ce in diagnostic_action_after_output (context=0x1323880, diagnostic=0x7fffd870) at ../../src/gcc/diagnostic.c:198 #2 0x005aed54 in diagnostic_report_diagnostic (context=0x1323880, diagnostic=0x7fffd870) at ../../src/gcc/diagnostic.c:424 #3 0x005afdc3 in internal_error (gmsgid=0xddfb57 "in %s, at %s:%d") at ../../src/gcc/diagnostic.c:709 #4 0x005aff4f in fancy_abort (file=0xe42670 "../../src/gcc/tree-ssa-alias.c", line=786, function=0xe427e0 "refs_may_alias_p_1") at ../../src/gcc/diagnostic.c:763 #5 0x008a1adb in refs_may_alias_p_1 (ref1=0x7fffdab0, ref2=0x7fffdb50, tbaa_p=1 '\001') at ../../src/gcc/tree-ssa-alias.c:775 #6 0x008a2b12 in ref_maybe_used_by_call_p_1 (call=0x76790630, ref=0x7fffdb50) at ../../src/gcc/tree-ssa-alias.c:1133 #7 0x008a2d2e in ref_maybe_used_by_call_p (call=0x76790630, ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1147 #8 0x008a2dfa in ref_maybe_used_by_stmt_p (stmt=0x76790630, ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1179 #9 0x008bf275 in dse_possible_dead_store_p (stmt=0x7683e820, use_stmt=0x7fffdca8) at ../../src/gcc/tree-ssa-dse.c:212 #10 0x008bfeb9 in dse_optimize_stmt (dse_gd=0x7fffddd0, bd=0x156bd30, gsi=...) at ../../src/gcc/tree-ssa-dse.c:297 #11 0x008c029d in dse_enter_block (walk_data=0x7fffdde0, bb=0x76a75068) at ../../src/gcc/tree-ssa-dse.c:370 #12 0x00cc26a5 in walk_dominator_tree (walk_data=0x7fffdde0, bb=0x76a75068) at ../../src/gcc/domwalk.c:185 #13 0x008c0812 in tree_ssa_dse () at ../../src/gcc/tree-ssa-dse.c:430 #14 0x0073af0a in execute_one_pass (pass=0x13cced0) at ../../src/gcc/passes.c:1572 #15 0x0073b21a in execute_pass_list (pass=0x13cced0) at ../../src/gcc/passes.c:1627 #16 0x0073b238 in execute_pass_list (pass=0x1312720) at ../../src/gcc/passes.c:1628 #17 0x0086e372 in tree_rest_of_compilation (fndecl=0x76b93500) at ../../src/gcc/tree-optimize.c:413 #18 0x009fa7c5 in cgraph_expand_function (node=0x76be7000) at ../../src/gcc/cgraphunit.c:1548 #19 0x009faa49 in cgraph_expand_all_functions () at ../../src/gcc/cgraphunit.c:1627 #20 0x009fb07e in cgraph_optimize () at ../../src/gcc/cgraphunit.c:1875 #21 0x009f9461 in cgraph_finalize_compilation_unit () at ../../src/gcc/cgraphunit.c:1096 #22 0x004a9e93 in c_write_global_declarations () at ../../src/gcc/c-decl.c:9519 #23 0x008180d4 in compile_file () at ../../src/gcc/toplev.c:1065 #24 0x0081a1c5 in do_compile () at ../../src/gcc/toplev.c:2417 #25 0x0081a286 in toplev_main (argc=21, argv=0x7fffe0f8) at ../../src/gcc/toplev.c:2459 #26 0x00519c6b in main (argc=21, argv=0x7fffe0f8) at ../../src/gcc/main.c:35 Thanks, Hongtao Purdue University
Re: Errors when invoking refs_may_alias_p_1
On 08/27/10 12:35, Richard Guenther wrote: > On Fri, Aug 27, 2010 at 5:27 PM, Hongtao wrote: > >> Hi all, >> >> I have instrumented a function call like foo(&a,&b) into the gimple SSA >> representation (gcc-4.5) and the consequent optimizations can not pass >> my instrumented code. The back traces are as followings. The error >> occurred when the pass dse tried to test if the call I inserted may use >> a memory reference. It is because the arguments &a is not a SSA_VAR or >> INDIRECT_REF, so the assert in function >> >> bool >> refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p) >> >> gcc_assert ((!ref1->ref >> || SSA_VAR_P (ref1->ref) >> || handled_component_p (ref1->ref) >> || INDIRECT_REF_P (ref1->ref) >> || TREE_CODE (ref1->ref) == TARGET_MEM_REF >> || TREE_CODE (ref1->ref) == CONST_DECL) >> && (!ref2->ref >> || SSA_VAR_P (ref2->ref) >> || handled_component_p (ref2->ref) >> || INDIRECT_REF_P (ref2->ref) >> || TREE_CODE (ref2->ref) == TARGET_MEM_REF >> || TREE_CODE (ref2->ref) == CONST_DECL)); >> was violated. >> >> Does anyone know why the function arguments must be a SSA_VAR or >> INDIRECT_REF here? Have I missed to perform any actions to maintain the >> consistency of Gimple SSA? >> > Yes. is_gimple_val () will return false for your arguments as it seems that > the variables do not have function invariant addresses. > > Richard. > > Thanks. But how can I change my argument to gimple_vals, using it with an assignment to a temp before and replacing my argument with the temp? Hongtao >> #0 0x76fc8ee0 in exit () from /lib/libc.so.6 >> #1 0x005ae4ce in diagnostic_action_after_output >> (context=0x1323880, diagnostic=0x7fffd870) at >> ../../src/gcc/diagnostic.c:198 >> #2 0x005aed54 in diagnostic_report_diagnostic >> (context=0x1323880, diagnostic=0x7fffd870) at >> ../../src/gcc/diagnostic.c:424 >> #3 0x005afdc3 in internal_error (gmsgid=0xddfb57 "in %s, at >> %s:%d") at ../../src/gcc/diagnostic.c:709 >> #4 0x005aff4f in fancy_abort (file=0xe42670 >> "../../src/gcc/tree-ssa-alias.c", line=786, function=0xe427e0 >> "refs_may_alias_p_1") >>at ../../src/gcc/diagnostic.c:763 >> #5 0x008a1adb in refs_may_alias_p_1 (ref1=0x7fffdab0, >> ref2=0x7fffdb50, tbaa_p=1 '\001') >>at ../../src/gcc/tree-ssa-alias.c:775 >> #6 0x008a2b12 in ref_maybe_used_by_call_p_1 >> (call=0x76790630, ref=0x7fffdb50) at >> ../../src/gcc/tree-ssa-alias.c:1133 >> #7 0x008a2d2e in ref_maybe_used_by_call_p (call=0x76790630, >> ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1147 >> #8 0x008a2dfa in ref_maybe_used_by_stmt_p (stmt=0x76790630, >> ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1179 >> #9 0x008bf275 in dse_possible_dead_store_p >> (stmt=0x7683e820, use_stmt=0x7fffdca8) at >> ../../src/gcc/tree-ssa-dse.c:212 >> #10 0x008bfeb9 in dse_optimize_stmt (dse_gd=0x7fffddd0, >> bd=0x156bd30, gsi=...) at ../../src/gcc/tree-ssa-dse.c:297 >> #11 0x008c029d in dse_enter_block (walk_data=0x7fffdde0, >> bb=0x76a75068) at ../../src/gcc/tree-ssa-dse.c:370 >> #12 0x00cc26a5 in walk_dominator_tree (walk_data=0x7fffdde0, >> bb=0x76a75068) at ../../src/gcc/domwalk.c:185 >> #13 0x008c0812 in tree_ssa_dse () at >> ../../src/gcc/tree-ssa-dse.c:430 >> #14 0x0073af0a in execute_one_pass (pass=0x13cced0) at >> ../../src/gcc/passes.c:1572 >> #15 0x0073b21a in execute_pass_list (pass=0x13cced0) at >> ../../src/gcc/passes.c:1627 >> #16 0x0073b238 in execute_pass_list (pass=0x1312720) at >> ../../src/gcc/passes.c:1628 >> #17 0x0086e372 in tree_rest_of_compilation >> (fndecl=0x76b93500) at ../../src/gcc/tree-optimize.c:413 >> #18 0x009fa7c5 in cgraph_expand_function (node=0x76be7000) >> at ../../src/gcc/cgraphunit.c:1548 >> #19 0x009faa49 in cgraph_expand_all_functions () at >> ../../src/gcc/cgraphunit.c:1627 >> #20 0x009fb07e in cgraph_optimize () at >> ../../src/gcc/cgraphunit.c:1875 >> #21 0x009f9461 in cgraph_finalize_compilation_unit () at >> ../../src/gcc/cgraphunit.c:1096 >> #22 0x004a9e93 in c_write_global_declarations () at >> ../../src/gcc/c-decl.c:9519 >> #23 0x008180d4 in compile_file () at ../../src/gcc/toplev.c:1065 >> #24 0x0081a1c5 in do_compile () at ../../src/gcc/toplev.c:2417 >> #25 0x0081a286 in toplev_main (argc=21, argv=0x7fffe0f8) at >> ../../src/gcc/toplev.c:2459 >> #26 0x00519c6b in main (argc=21, argv=0x7fffe0f8) at >> ../../src/gcc/main.c:35 >> >> Thanks, >> >> Hongtao >> Purdue University >> >> >> >> >
Re: Errors when invoking refs_may_alias_p_1
On 08/27/10 14:29, Richard Guenther wrote: > On Fri, Aug 27, 2010 at 8:24 PM, Hongtao wrote: > >> On 08/27/10 12:35, Richard Guenther wrote: >> >>> On Fri, Aug 27, 2010 at 5:27 PM, Hongtao wrote: >>> >>> >>>> Hi all, >>>> >>>> I have instrumented a function call like foo(&a,&b) into the gimple SSA >>>> representation (gcc-4.5) and the consequent optimizations can not pass >>>> my instrumented code. The back traces are as followings. The error >>>> occurred when the pass dse tried to test if the call I inserted may use >>>> a memory reference. It is because the arguments &a is not a SSA_VAR or >>>> INDIRECT_REF, so the assert in function >>>> >>>> bool >>>> refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p) >>>> >>>> gcc_assert ((!ref1->ref >>>> || SSA_VAR_P (ref1->ref) >>>> || handled_component_p (ref1->ref) >>>> || INDIRECT_REF_P (ref1->ref) >>>> || TREE_CODE (ref1->ref) == TARGET_MEM_REF >>>> || TREE_CODE (ref1->ref) == CONST_DECL) >>>> && (!ref2->ref >>>> || SSA_VAR_P (ref2->ref) >>>> || handled_component_p (ref2->ref) >>>> || INDIRECT_REF_P (ref2->ref) >>>> || TREE_CODE (ref2->ref) == TARGET_MEM_REF >>>> || TREE_CODE (ref2->ref) == CONST_DECL)); >>>> was violated. >>>> >>>> Does anyone know why the function arguments must be a SSA_VAR or >>>> INDIRECT_REF here? Have I missed to perform any actions to maintain the >>>> consistency of Gimple SSA? >>>> >>>> >>> Yes. is_gimple_val () will return false for your arguments as it seems that >>> the variables do not have function invariant addresses. >>> >>> Richard. >>> >>> >>> >> Thanks. But how can I change my argument to gimple_vals, using it with >> an assignment to a temp before and replacing my argument with the temp? >> > Yes, that will work. > > Richard. > > OK. Do we have to rewrite it like this everytime we insert a function call on Gimple body if the argument of that call is an expression? Thanks, Hongtao >> Hongtao >> >>>> #0 0x76fc8ee0 in exit () from /lib/libc.so.6 >>>> #1 0x005ae4ce in diagnostic_action_after_output >>>> (context=0x1323880, diagnostic=0x7fffd870) at >>>> ../../src/gcc/diagnostic.c:198 >>>> #2 0x005aed54 in diagnostic_report_diagnostic >>>> (context=0x1323880, diagnostic=0x7fffd870) at >>>> ../../src/gcc/diagnostic.c:424 >>>> #3 0x005afdc3 in internal_error (gmsgid=0xddfb57 "in %s, at >>>> %s:%d") at ../../src/gcc/diagnostic.c:709 >>>> #4 0x005aff4f in fancy_abort (file=0xe42670 >>>> "../../src/gcc/tree-ssa-alias.c", line=786, function=0xe427e0 >>>> "refs_may_alias_p_1") >>>>at ../../src/gcc/diagnostic.c:763 >>>> #5 0x008a1adb in refs_may_alias_p_1 (ref1=0x7fffdab0, >>>> ref2=0x7fffdb50, tbaa_p=1 '\001') >>>>at ../../src/gcc/tree-ssa-alias.c:775 >>>> #6 0x008a2b12 in ref_maybe_used_by_call_p_1 >>>> (call=0x76790630, ref=0x7fffdb50) at >>>> ../../src/gcc/tree-ssa-alias.c:1133 >>>> #7 0x008a2d2e in ref_maybe_used_by_call_p (call=0x76790630, >>>> ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1147 >>>> #8 0x008a2dfa in ref_maybe_used_by_stmt_p (stmt=0x76790630, >>>> ref=0x76848048) at ../../src/gcc/tree-ssa-alias.c:1179 >>>> #9 0x008bf275 in dse_possible_dead_store_p >>>> (stmt=0x7683e820, use_stmt=0x7fffdca8) at >>>> ../../src/gcc/tree-ssa-dse.c:212 >>>> #10 0x008bfeb9 in dse_optimize_stmt (dse_gd=0x7fffddd0, >>>> bd=0x156bd30, gsi=...) at ../../src/gcc/tree-ssa-dse.c:297 >>>> #11 0x008c029d in dse_enter_block (walk_data=0x7fffdde0, >>>> bb=0x76a75068) at ../../src/gcc/tree-ssa-dse.c:370 >>>> #12 0x00cc26a5 in walk_dominator_tree (walk_data=0x7fffdde0, >>>> bb=0x76a75068) at ../../src/gcc/domwalk.c:185 >>>> #13 0x008c0812 in tree_ssa_dse () at >>>> ../../s
About LTO merging symbol tables
Hi All, Does the lto module merge all the global symbol tables maintained by each function together? I mean, for example, if in two function bodies there are two global variable (suppose its name is aaa) references . Are the VAR_DECL nodes of variable aaa in the two function bodies is the same node? Thanks, Hongtao Yu Purdue University
How to dump SSA in lto
Hi All, I'm programming in the LTO phase. How can I dump the SSA representation after a optimization of LTO? For example, if I would like to know the effect of interprocedural pointer analysis(pass_ipa_pta), how can I dump the SSA form after the pass? Thanks, Hongtao Yu Purdue University
Re: How to dump SSA in lto
Thanks very much. But I still want an option to dump the SSA form during or after LTO optimizations, such as -fdump-tree-... Hongtao On 09/21/10 10:07, Richard Guenther wrote: > On Tue, Sep 21, 2010 at 3:31 PM, Hongtao wrote: >> Hi All, >> >> >> I'm programming in the LTO phase. How can I dump the SSA representation >> after a optimization of LTO? For example, if I would like to know the >> effect of interprocedural pointer analysis(pass_ipa_pta), how can I dump >> the SSA form after the pass? > That particular case would be -fdump-ipa-pta-alias > >> Thanks, >> Hongtao Yu >> Purdue University >>
Interprocedural points-to analysis
Hi All, Has the interprocedural points-to analysis(pass-ipa-pta) been put into practice, i.e. using the ipa points-to set to aid optimizations? Thanks, Hongtao
About DECL_UID
Hi All, May the DECL_UID of any two local variables of two separated functions be the same during LTO ? Thanks, Hongtao Yu Purdue University
Re: About DECL_UID
On 09/25/10 16:48, Diego Novillo wrote: > On Sat, Sep 25, 2010 at 16:40, Hongtao wrote: > >> May the DECL_UID of any two local variables of two separated functions >> be the same during LTO ? > No. DECL_UIDs are unique within a single translation unit. > OK, thanks. But it means there may be two local variables in different source files that can own the same DECL_UID, though LTO links the two source files together? Hongtao > Diego. >
Map tree to properties
Hi All, Do we have a mechanism to map a tree or gimple to a series of properties so that we can transfer information from one pass to another? Thanks, Hongtao Purdue Univeristy
Insert new global declaration to Gimple
Dear All, How can I build a new global variable as well as its initializer on Gimple? Thanks, Hongtao Yu Purdue Univeristy
Re: Insert new global declaration to Gimple
Sorry, I mean if I had built a VAR_DECL node as well as its DECL_INITIAL, where should I place the node? Since it is a global variable, can I just build the VAR_DECL node without placing it in any container, say symbol table or anywhere else? Thanks, Hongtao On 10/12/10 10:11, Hongtao wrote: > Dear All, > > How can I build a new global variable as well as its initializer on Gimple? > > Thanks, > Hongtao Yu > Purdue Univeristy
Options for dumping dependence checking results
Hi All, What's the option for dumping the results of loop dependence checking? such as dependence relations, direction vectors, etc. Thanks, Hongtao
ipa on all files together
Hi All, While using gcc-4.6 with option -flto, I found that interprocedural analysis were performed on each source file separately. For example for the pass pass_ipa_pta, if we compile two files like : gcc -O -flto f1.c f2.c we have the pass run twice, one for each source file. So is there a way that can perform IPA on all source files together? Thanks, Hongtao Purdue University
Re: ipa on all files together
On 11/01/10 20:35, Diego Novillo wrote: > On Mon, Nov 1, 2010 at 19:57, Hongtao wrote: >> Hi All, >> >> While using gcc-4.6 with option -flto, I found that interprocedural >> analysis were performed on each source file separately. For example for >> the pass pass_ipa_pta, if we compile two files like : >> gcc -O -flto f1.c f2.c >> we have the pass run twice, one for each source file. So is there a way >> that can perform IPA on all source files together? > With -combine you used to be able to do this, but it has been removed > in favour of -flto (actually, I'm not quite sure whether it's been > removed already, but it's on the chopping block). > > With -flto, IPA will be performed on all the files together, as well > as each file separately. In your example, IPA runs 3 times. Once for > each f1.c and f2.c, and a third time with both f1.o and f2.o as a > single translation unit. Thanks. But can I only keep the third pass, i.e. I want to perform a pass only on all units together without on each unit separately? Hongtao > Diego. >
Re: [9/10 Regression] [PR87833] Intel MIC (emulated) offloading still broken (was: GCC 9.0.1 Status Report (2019-04-25))
On Tue, Apr 30, 2019 at 7:31 PM Jakub Jelinek wrote: > > On Tue, Apr 30, 2019 at 01:02:40PM +0200, Thomas Schwinge wrote: > > Hi Jakub! > > > > On Tue, 30 Apr 2019 12:56:52 +0200, Jakub Jelinek wrote: > > > On Tue, Apr 30, 2019 at 12:47:54PM +0200, Thomas Schwinge wrote: > > > > Email to apparently is no longer gets delivered. > > > > Is there anyone else from Intel who'd take over maintenance? > > > > > > As your patch is to LTO option handling, I think you want a review from > > > Honza. > > > > Well, I'm actually not asking for review of the WIP patch, but rather > > looking for someone to take on ownership/maintenance of the functionality > > of Intel MIC offloading. > > That would be indeed greatly appreciated. > > Jakub I don't konw this guy ilya.ver...@intel.com. Do you know him/her, H.J? -- BR, Hongtao
Symbolic range analysis
Hi All, Does it perform symbolic range analysis or array section analysis in GCC-4.6 ? Thanks, Hongtao Yu Purdue University
About new project
Hi All, How can I set up a new project under GCC and make it open-sourced? Thanks! Cheers, Hongtao
Re: About new project
On 1/27/2013 5:04 PM, Gerald Pfeifer wrote: On Sat, 26 Jan 2013, Hongtao Yu wrote: How can I set up a new project under GCC and make it open-sourced? Thanks! That depends on what you mean by "under GCC", I'd say. If you have improvements for GCC, submitting those as patches against GCC will be best, cf. http://gcc.gnu.org/contribute.html . If you want to work on an independent project, you can just go ahead and use one of those services like github, SourceForge etc. Actually, we have designed and implement a tentative demand-driven flow- and context-sensitive pointer analysis in GCC 4.7. This pointer analysis is used for pairwise data dependence checking for vectorization. Currently, it does not serve for optimizations directly, although it may do in the future. Do you think which way is best for releasing our code, to open a branch inside GCC or to release a plugin for GCC? Thanks! Hongtao (Note that the GNU project talks about free software, cf. https://www.gnu.org/philosophy/free-software-for-freedom.html ) Gerald
Re: [IMPORTANT] ChangeLog related changes
On Tue, May 26, 2020 at 6:49 AM Jakub Jelinek via Gcc-patches wrote: > > Hi! > > I've turned the strict mode of Martin Liška's hook changes, > which means that from now on no commits to the trunk or release branches > should be changing any ChangeLog files together with the other files, > ChangeLog entry should be solely in the commit message. > The DATESTAMP bumping script will be updating the ChangeLog files for you. Oh, no wonder my patch was rejected by git hook with error message --- ChangeLog files, DATESTAMP, BASE-VER and DEV-PHASE can be modified only separately from other files --- > If somebody makes a mistake in that, please wait 24 hours (at least until i commit a separate patch alone only for ChangeLog files, should i revert it? > after 00:16 UTC after your commit) so that the script will create the > ChangeLog entries, and afterwards it can be fixed by adjusting the ChangeLog > files. But you can only touch the ChangeLog files in that case (and > shouldn't write a ChangeLog entry for that in the commit message). > > If anything goes wrong, please let me, other RMs and Martin Liška know. > > Jakub > -- BR, Hongtao
Re: [IMPORTANT] ChangeLog related changes
Great, thanks! On Tue, May 26, 2020 at 2:08 PM Martin Liška wrote: > > On 5/26/20 7:22 AM, Hongtao Liu via Gcc wrote: > > i commit a separate patch alone only for ChangeLog files, should i revert > > it? > > Hello. > > I've just done it. > > Martin -- BR, Hongtao
Re: Help with PR97872
, I am not sure how to check if target can actually expand > > > > > > vec_cmp ? > > > > > > I assume that since expand_vec_cmp_expr_p queries optab and if it > > > > > > gets > > > > > > a valid cmp icode, that > > > > > > should be sufficient ? > > > > > > > > > > Yes > > > > Hi Richard, > > > > I tested the patch, and it shows one regression for pr78102.c, because > > > > of extra pcmpeqq in code-gen for x != y on x86. > > > > For the test-case: > > > > __v2di > > > > baz (const __v2di x, const __v2di y) > > > > { > > > > return x != y; > > > > } > > > > > > > > Before patch: > > > > baz: > > > > pcmpeqq %xmm1, %xmm0 > > > > pcmpeqd %xmm1, %xmm1 > > > > pandn %xmm1, %xmm0 > > > > ret > > > > > > > > After patch, > > > > Before ISEL: > > > > vector(2) _1; > > > > __v2di _4; > > > > > > > >[local count: 1073741824]: > > > > _1 = x_2(D) != y_3(D); > > > > _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>; > > > > return _4; > > > > > > > > After ISEL: > > > > vector(2) _1; > > > > __v2di _4; > > > > > > > >[local count: 1073741824]: > > > > _1 = x_2(D) != y_3(D); > > > > _4 = VIEW_CONVERT_EXPR<__v2di>(_1); > > > > return _4; > > > > > > > > which results in: > > > > pcmpeqq %xmm1, %xmm0 > > > > pxor%xmm1, %xmm1 > > > > pcmpeqq %xmm1, %xmm0 > > > > ret > > > > > > > > IIUC, the new code-gen is essentially comparing two args for equality, > > > > and then > > > > comparing the result against zero to invert it, so it looks correct ? > > > > I am not sure which of the above two sequences is better tho ? > > > > If the new code-gen is OK, would it be OK to adjust the test-case ? > > > > > > In case pcmpeqq is double-issue the first variant might be faster while > > > the second variant has the advantage of the "free" pxor, but back-to-back > > > pcmpeqq might have an issue. > > > > > > I think on GIMPLE the new code is preferable and adjustments are > > > target business. I wouldn't be surprised if the x86 backend > > > special-cases vcond to {-1,-1}, {0,0} already to arrive at the first > > > variant. > > > > > > Did you check how > > > > > > a = x != y ? { -1, -1 } : {0, 0 }; > > > b = x != y ? { 1, 2 } : { 3, 4 }; > > > > > > is handled before/after your patch? That is, make the comparison > > > CSEd between two VEC_COND_EXPRs? > > For the test-case: > > __v2di f(__v2di, __v2di); > > > > __v2di > > baz (const __v2di x, const __v2di y) > > { > > __v2di a = (x != y); > > __v2di b = (x != y) ? (__v2di) {1, 2} : (__v2di) {3, 4}; > > return f (a, b); > > } > > > > Before patch, isel converts both to .vcondeq: > > __v2di b; > > __v2di a; > > __v2di _8; > > > >[local count: 1073741824]: > > a_4 = .VCONDEQ (x_2(D), y_3(D), { -1, -1 }, { 0, 0 }, 114); > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114); > > _8 = f (a_4, b_5); [tail call] > > return _8; > > > > and results in following code-gen: > > _Z3bazDv2_xS_: > > .LFB5666: > > pcmpeqq %xmm1, %xmm0 > > pcmpeqd %xmm1, %xmm1 > > movdqa %xmm0, %xmm2 > > pandn %xmm1, %xmm2 > > movdqa .LC0(%rip), %xmm1 > > pblendvb%xmm0, .LC1(%rip), %xmm1 > > movdqa %xmm2, %xmm0 > > jmp _Z1fDv2_xS_ > > > > With patch, isel converts a = (x != y) ? {-1, -1} : {0, 0} to > > view_convert_expr and the other > > to vcondeq: > > __v2di b; > > __v2di a; > > vector(2) _1; > > __v2di _8; > > > >[local count: 1073741824]: > > _1 = x_2(D) != y_3(D); > > a_4 = VIEW_CONVERT_EXPR<__v2di>(_1); > > b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114); > > _8 = f (a_4, b_5); [tail call] > > return _8; > > > > which results in following code-gen: > > _Z3bazDv2_xS_: > > .LFB5666: > > pcmpeqq %xmm1, %xmm0 > > pxor%xmm2, %xmm2 > > movdqa .LC0(%rip), %xmm1 > > pblendvb%xmm0, .LC1(%rip), %xmm1 > > pcmpeqq %xmm0, %xmm2 > > movdqa %xmm2, %xmm0 > > jmp _Z1fDv2_xS_ > > Ok, thanks for checking. I think the patch is OK but please let > Hongtao the chance to comment. > > Richard. > > > Thanks, > > Prathamesh > > > > > > Thanks, > > > Richard. > > > > > > > > > > Thanks, > > > > Prathamesh > > > > > > > > > > > Thanks, > > > > > > Prathamesh > > > > > > > > > > > > > > Richard. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Prathamesh > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Prathamesh > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Richard Biener > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > > > > > > > > Nuernberg, > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Richard Biener > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > > > > > > Nuernberg, > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > > > > > > > > > > > > > > > > -- > > > > > Richard Biener > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > > > > Nuernberg, > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > > > > > > > > > > -- > > > Richard Biener > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > > > > -- > Richard Biener > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) -- BR, Hongtao
Re: Help with PR97872
On Mon, Dec 7, 2020 at 7:11 PM Prathamesh Kulkarni wrote: > > On Mon, 7 Dec 2020 at 16:15, Hongtao Liu wrote: > > > > On Mon, Dec 7, 2020 at 5:47 PM Richard Biener wrote: > > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote: > > > > > > > On Mon, 7 Dec 2020 at 13:01, Richard Biener wrote: > > > > > > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote: > > > > > > > > > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener > > > > > > wrote: > > > > > > > > > > > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > For the test mentioned in PR, I was trying to see if we > > > > > > > > > > > > could do > > > > > > > > > > > > specialized expansion for vcond in target when operands > > > > > > > > > > > > are -1 and 0. > > > > > > > > > > > > arm_expand_vcond gets the following operands: > > > > > > > > > > > > (reg:V8QI 113 [ _2 ]) > > > > > > > > > > > > (reg:V8QI 117) > > > > > > > > > > > > (reg:V8QI 118) > > > > > > > > > > > > (lt (reg/v:V8QI 115 [ a ]) > > > > > > > > > > > > (reg/v:V8QI 116 [ b ])) > > > > > > > > > > > > (reg/v:V8QI 115 [ a ]) > > > > > > > > > > > > (reg/v:V8QI 116 [ b ]) > > > > > > > > > > > > > > > > > > > > > > > > where r117 and r118 are set to vector constants -1 and > > > > > > > > > > > > 0 respectively. > > > > > > > > > > > > However, I am not sure if there's a way to check if the > > > > > > > > > > > > register is > > > > > > > > > > > > constant during expansion time (since we don't have df > > > > > > > > > > > > analysis yet) ? > > > > It seems to me that all you need to do is relax the predicates of op1 > > and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't > > debugged it, but I see that vcondmn in neon.md only accepts > > s_register_operand. > > > > (define_expand "vcond" > > [(set (match_operand:VDQW 0 "s_register_operand") > > (if_then_else:VDQW > > (match_operator 3 "comparison_operator" > > [(match_operand:VDQW 4 "s_register_operand") > > (match_operand:VDQW 5 "reg_or_zero_operand")]) > > (match_operand:VDQW 1 "s_register_operand") > > (match_operand:VDQW 2 "s_register_operand")))] > > "TARGET_NEON && (! || flag_unsafe_math_optimizations)" > > { > > arm_expand_vcond (operands, mode); > > DONE; > > }) > > > > in sse.md it's defined as > > (define_expand "vcondu" > > [(set (match_operand:V_512 0 "register_operand") > > (if_then_else:V_512 > > (match_operator 3 "" > > [(match_operand:VI_AVX512BW 4 "nonimmediate_operand") > > (match_operand:VI_AVX512BW 5 "nonimmediate_operand")]) > > (match_operand:V_512 1 "general_operand") > > (match_operand:V_512 2 "general_operand")))] > > "TARGET_AVX512F > >&& (GET_MODE_NUNITS (mode) > >== GET_MODE_NUNITS (mode))" > > { > > bool ok = ix86_expand_int_vcond (operands); > > gcc_assert (ok); > > DONE;
Re: Help with PR97872
It seems better with your PR97872 fix on i386. Cat test.c typedef char v16qi __attribute__ ((vector_size(16))); v16qi f1(v16qi a, v16qi b) { return (a & b) != 0; } before f1(char __vector(16), char __vector(16)): pand %xmm1, %xmm0 pxor %xmm1, %xmm1 pcmpeqb %xmm1, %xmm0 pcmpeqd %xmm1, %xmm1 pandn %xmm1, %xmm0 ret After the pr97872 fix f1(char __vector(16), char __vector(16)): pand xmm0, xmm1 pxor xmm1, xmm1 pcmpeqb xmm0, xmm1 pcmpeqb xmm0, xmm1 ret On Wed, Dec 9, 2020 at 7:47 PM Prathamesh Kulkarni wrote: > > On Tue, 8 Dec 2020 at 14:36, Prathamesh Kulkarni > wrote: > > > > On Mon, 7 Dec 2020 at 17:37, Hongtao Liu wrote: > > > > > > On Mon, Dec 7, 2020 at 7:11 PM Prathamesh Kulkarni > > > wrote: > > > > > > > > On Mon, 7 Dec 2020 at 16:15, Hongtao Liu wrote: > > > > > > > > > > On Mon, Dec 7, 2020 at 5:47 PM Richard Biener > > > > > wrote: > > > > > > > > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > On Mon, 7 Dec 2020 at 13:01, Richard Biener > > > > > > > wrote: > > > > > > > > > > > > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > For the test mentioned in PR, I was trying to see > > > > > > > > > > > > > > > if we could do > > > > > > > > > > > > > > > specialized expansion for vcond in target when > > > > > > > > > > > > > > > operands are -1 and 0. > > > > > > > > > > > > > > > arm_expand_vcond gets the following operands: > > > > > > > > > > > > > > > (reg:V8QI 113 [ _2 ]) > > > > > > > > > > > > > > > (reg:V8QI 117) > > > > > > > > > > > > > > > (reg:V8QI 118) > > > > > > > > > > > > > > > (lt (reg/v:V8QI 115 [ a ]) > > > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ])) > > > > > > > > > > > > > > > (reg/v:V8QI 115 [ a ]) > > > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ]) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > where r117 and r118 are set to vector constants > > > > > > > > > > > > > > > -1 and 0 respectively. > > > > > > > > > > > > > > > However, I am not sure if there's a way to check > > > > > > > > > > > > > > > if the register is > > > > > > > > > > > > > > > constant during expansion time (since we don't > > > > > > > > > > > > > > > have df analysis yet) ? > > > > > > > > > > It seems to me that all you need to do is relax the predicates of op1 > > > > > and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't > > > > > debugged it, but I see that vcondmn in neon.md only accepts > > > > > s_register_operand. >
Re: State of AutoFDO in GCC
Andi, thanks for pointing out the perf script issues. Can you please elaborate a bit on the exact issue you have seen? We’ve been using specific output of perf script such as mmap, LBR and callstack events filtered by process id. It works fine so far but may certainly hit issues in the future with extended uses. Thanks, Hongtao From: Xinliang David Li Date: Monday, April 26, 2021 at 11:05 AM To: Andi Kleen Cc: Jan Hubicka , gcc@gcc.gnu.org , Wei Mi , Eugene Rozenfeld , Wenlei He , Hongtao Yu Subject: Re: State of AutoFDO in GCC On Mon, Apr 26, 2021 at 11:00 AM Andi Kleen mailto:a...@linux.intel.com>> wrote: >There are multiple directional changes in this new tool: >1) it uses perf-script trace output (in text) as input profile data; I suspect this will break regularly too (I personally did numerous changes to perf script output, and also wrote a lot of parsing scripts) The perf script output has some bad problems, e.g. for file names or processes with spaces and some other issues. To make it handleable would need some redesign to actually generate in a machine friendly format. Andi, thanks for the input. +authors of the llvm-profgen tool for their experience with using perf script output. David A perf.data parser should be fine, just don't fill it up with asserts and "be liberal what you accept" and ignore unknown records. -Andi
RE: Hongtao Liu as x86 vectorization maintainer
>-Original Message- >From: Jason Merrill >Sent: Monday, June 21, 2021 10:07 AM >To: Liu, Hongtao >Cc: gcc Mailing List ; Marek Polacek >Subject: Hongtao Liu as x86 vectorization maintainer > >I am pleased to announce that the GCC Steering Committee has appointed >Hongtao Liu as maintainer of the i386 vector extensions in GCC. > >Hongtao, please update your listing in the MAINTAINERS file. Updated, thanks. > >Cheers, >Jason
Re: Hongtao Liu as x86 vectorization maintainer
On Tue, Jun 22, 2021 at 3:58 PM Jakub Jelinek via Gcc wrote: > > On Mon, Jun 21, 2021 at 02:49:56AM +, Liu, Hongtao via Gcc wrote: > > >-Original Message- > > >From: Jason Merrill > > >Sent: Monday, June 21, 2021 10:07 AM > > >To: Liu, Hongtao > > >Cc: gcc Mailing List ; Marek Polacek > > >Subject: Hongtao Liu as x86 vectorization maintainer > > > > > >I am pleased to announce that the GCC Steering Committee has appointed > > >Hongtao Liu as maintainer of the i386 vector extensions in GCC. > > > > > >Hongtao, please update your listing in the MAINTAINERS file. > > > > Updated, thanks. > > Congrats. > > You should also remove your Write After Approval entry, otherwise > Running .../gcc/testsuite/gcc.src/maintainers.exp ... > Redundant in write approval: Hongtao Liu > FAIL: maintainers-verify.sh > test fails. > Thanks for reminding, updated. > Jakub > -- BR, Hongtao
[Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?
Hi: The original problem was that some users wanted the cmdline option -ffast-math not to act on intrinsic production code. .i.e for codes like #include __m256d foo2 (__m256d a, __m256d b, __m256d c, __m256d d) { __m256d tmp = _mm256_add_pd (a, b); tmp = _mm256_sub_pd (tmp, c); tmp = _mm256_sub_pd (tmp, d); return tmp; } compiled with -O2 -mavx2 -ffast-math, users expected codes generated like vaddpd ymm0, ymm0, ymm1 vsubpd ymm0, ymm0, ymm2 vsubpd ymm0, ymm0, ymm3 but not vsubpd ymm1, ymm1, ymm2 vsubpd ymm0, ymm0, ymm3 vaddpd ymm0, ymm1, ymm0 For the LLVM side, there're mechanisms like #pragma float_control( precise, on, push) ...(intrinsics definition).. #pragma float_control(pop) When intrinsics are inlined, their IRs will be marked with "no-fast-math", and even if the caller is compiled with -ffast-math, reassociation only happens to those IRs which are not marked with "no-fast-math". It seems to be more flexible to support fast math control of a region(inside a function). Does GCC have a similar mechanism? -- BR, Hongtao
Re: [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?
On Wed, Jul 14, 2021 at 1:15 PM Hongtao Liu wrote: > > Hi: > The original problem was that some users wanted the cmdline option > -ffast-math not to act on intrinsic production code. .i.e for codes > like > > #include > __m256d > foo2 (__m256d a, __m256d b, __m256d c, __m256d d) > { > __m256d tmp = _mm256_add_pd (a, b); > tmp = _mm256_sub_pd (tmp, c); > tmp = _mm256_sub_pd (tmp, d); > return tmp; > } > > compiled with -O2 -mavx2 -ffast-math, users expected codes generated like > > vaddpd ymm0, ymm0, ymm1 > vsubpd ymm0, ymm0, ymm2 > vsubpd ymm0, ymm0, ymm3 > > but not > > vsubpd ymm1, ymm1, ymm2 > vsubpd ymm0, ymm0, ymm3 > vaddpd ymm0, ymm1, ymm0 > > > For the LLVM side, there're mechanisms like > #pragma float_control( precise, on, push) > ...(intrinsics definition).. > #pragma float_control(pop) > > When intrinsics are inlined, their IRs will be marked with > "no-fast-math", and even if the caller is compiled with -ffast-math, > reassociation only happens to those IRs which are not marked with > "no-fast-math". It seems to be more flexible to support fast math > control of a region(inside a function). Testcase https://godbolt.org/z/9cYMGGWPG > > Does GCC have a similar mechanism? > > > -- > BR, > Hongtao -- BR, Hongtao
Re: [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?
On Wed, Jul 14, 2021 at 2:39 PM Matthias Kretz wrote: > > On Wednesday, 14 July 2021 07:18:29 CEST Hongtao Liu via Gcc-help wrote: > > On Wed, Jul 14, 2021 at 1:15 PM Hongtao Liu wrote: > > > Hi: > > > The original problem was that some users wanted the cmdline option > > > > > > -ffast-math not to act on intrinsic production code. > > This sounds like the users want intrinsics to map *directly* to the Thanks for the reply. I think the users want the mixed usage of fast-math and no-fast-math. > corresponding instruction. If that's the case such users should use inline > assembly, IMHO. If you compile a TU with -ffast-math then *all* floating-point > operations are affected. Yes, more control over where to use fast-math and the > ability to mix fast-math and no-fast-math without risking ODR violations would > be great. But that's a larger issue, and one that would ideally be solved in > WG14/WG21. hmm, guess it would need a lot of work. > > FWIW, this is what I'd do, i.e. turn off fast-math for the function in > question: > https://godbolt.org/z/3cKq5hT1o > > -- > ── > Dr. Matthias Kretz https://mattkretz.github.io > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > std::experimental::simd https://github.com/VcDevel/std-simd > ── -- BR, Hongtao
Re: [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?
On Wed, Jul 14, 2021 at 3:49 PM Matthias Kretz wrote: > > On Wednesday, 14 July 2021 09:39:42 CEST Richard Biener wrote: > > -ffast-math decomposes to quite some flag_* and those generally are not > > reflected into the IL but can be different per function (and then > > prevent inlining). > > Is there any chance the "and then prevent inlining" can be eliminated? Because > then I could write my own fast class in C++, marking all operators with > __attribute__((optimize("-Ofast")))... > > > There's one "related" IL feature used by the Fortran frontend - PAREN_EXPR > > prevents association across it. So for Fortran (when not > > -fno-protect-parens which is enabled by -Ofast), (a + b) - b cannot be > > optimized to a. Eventually this could be used to wrap intrinsic results > > since most of the issues in the end require association. Note PAREN_EXPR > > isn't exposed to the C family frontends but we could of course add a > > builtin-like thing for this _Noassoc ( ) or so. Note PAREN_EXPR after a simple grep, I see PAREN_EXPR is expanded to the common RTL pattern. So it doesn't prevent any reassociation at the rtl level? > > survives -Ofast so it's the frontends that would need to choose to emit or > > not emit it (or always emit it). > > Interesting. I want that builtin in C++. Currently I use inline asm to achieve > a similar effect. But the inline asm hammer is really too big for the problem. > > > -- > ── > Dr. Matthias Kretz https://mattkretz.github.io > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > std::experimental::simd https://github.com/VcDevel/std-simd > ── -- BR, Hongtao
Re: [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?
On Wed, Jul 14, 2021 at 4:17 PM Richard Biener wrote: > > On Wed, Jul 14, 2021 at 10:11 AM Hongtao Liu wrote: > > > > On Wed, Jul 14, 2021 at 3:49 PM Matthias Kretz wrote: > > > > > > On Wednesday, 14 July 2021 09:39:42 CEST Richard Biener wrote: > > > > -ffast-math decomposes to quite some flag_* and those generally are not > > > > reflected into the IL but can be different per function (and then > > > > prevent inlining). > > > > > > Is there any chance the "and then prevent inlining" can be eliminated? > > > Because > > > then I could write my own fast class in C++, marking all operators > > > with > > > __attribute__((optimize("-Ofast")))... > > > > > > > There's one "related" IL feature used by the Fortran frontend - > > > > PAREN_EXPR > > > > prevents association across it. So for Fortran (when not > > > > -fno-protect-parens which is enabled by -Ofast), (a + b) - b cannot be > > > > optimized to a. Eventually this could be used to wrap intrinsic results > > > > since most of the issues in the end require association. Note > > > > PAREN_EXPR > > > > isn't exposed to the C family frontends but we could of course add a > > > > builtin-like thing for this _Noassoc ( ) or so. Note PAREN_EXPR > > after a simple grep, I see PAREN_EXPR is expanded to the common RTL > > pattern. So it doesn't prevent any reassociation at the rtl level? > > We don't perform any FP reassociation on RTL (and yes, the above relies on -ffast-math will imply flag_associative_math, and w/ that we do have reassociation on RTL /* Reassociate floating point addition only when the user specifies associative math operations. */ if (FLOAT_MODE_P (mode) && flag_associative_math) { tem = simplify_associative_operation (code, mode, op0, op1); if (tem) return tem; } > this). We're also expanding rint() to x + 2**52 - 2**52 (ix86_expand_rint) > even > with -ffast-math so we do rely on RTL optimizations not cancelling the +-. > > Richard. > > > > > > > survives -Ofast so it's the frontends that would need to choose to emit > > > > or > > > > not emit it (or always emit it). > > > > > > Interesting. I want that builtin in C++. Currently I use inline asm to > > > achieve > > > a similar effect. But the inline asm hammer is really too big for the > > > problem. > > > > > > > > > -- > > > ── > > > Dr. Matthias Kretz https://mattkretz.github.io > > > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > > > std::experimental::simd https://github.com/VcDevel/std-simd > > > ── > > > > > > > > -- > > BR, > > Hongtao -- BR, Hongtao
Re: How to detect user uses -masm=intel?
On Thu, Jul 29, 2021 at 10:49 AM unlvsur unlvsur via Gcc wrote: > > What I mean is that what macro GCC sets when it compiles -masm=intel > > > Int main() > { > #ifdef /*__INTEL_ASM*/ > printf(“intel”); > #else > printf(“at&t”); > #endif > } not fully understand what you're seeking, probably you're looking for ASSEMBLER_DIALECT. cut from i386.c --- void ix86_print_operand (FILE *file, rtx x, int code) { if (code) { switch (code) { case 'A': switch (ASSEMBLER_DIALECT) { case ASM_ATT: putc ('*', file); break; case ASM_INTEL: /* Intel syntax. For absolute addresses, registers should not be surrounded by braces. */ if (!REG_P (x)) { putc ('[', file); ix86_print_operand (file, x, 0); putc (']', file); return; } break; -- > Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 > > From: Andrew Pinski<mailto:pins...@gmail.com> > Sent: Wednesday, July 28, 2021 21:43 > To: unlvsur unlvsur<mailto:unlv...@live.com> > Cc: gcc@gcc.gnu.org<mailto:gcc@gcc.gnu.org> > Subject: Re: How to detect user uses -masm=intel? > > On Wed, Jul 28, 2021 at 6:41 PM unlvsur unlvsur via Gcc > wrote: > > > > Any GCC macro that can tell the code it is using the intel format’s > > assembly instead of at&t?? > > Inside the inline-asm you can use the alternative. > Like this: > cmp{b}\t{%1, %h0|%h0, %1} > > This is how GCC implements this inside too. > > Thanks, > Andrew > > > > > Sent from > > Mail<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgo.microsoft.com%2Ffwlink%2F%3FLinkId%3D550986&data=04%7C01%7C%7C9ff9312911b84c6126dc08d952323529%7C84df9e7fe9f640afb435%7C1%7C0%7C637631197911449533%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ygQvHY1b7whxaAMvhglHY12E688oc%2F%2BqBe7AKwVQfBs%3D&reserved=0> > > for Windows 10 > > > -- BR, Hongtao
Re: Suboptimal code generated for __buitlin_ceil on AMD64 without SS4_4.1
Could you file a bugzilla for that? https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc On Thu, Aug 5, 2021 at 3:34 PM Stefan Kanthak wrote: > > Hi, > > targeting AMD64 alias x86_64 with -O3, GCC 10.2.0 generates the > following code (17 instructions using 78 bytes, plus 6 quadwords > using 48 bytes) for __builtin_ceil() when -msse4.1 is NOT given: > > .text >0: f2 0f 10 15 10 00 00 00 movsd .LC1(%rip), %xmm2 > 4: R_X86_64_PC32.rdata >8: f2 0f 10 25 00 00 00 00 movsd .LC0(%rip), %xmm4 > c: R_X86_64_PC32.rdata > 10: 66 0f 28 d8 movapd %xmm0, %xmm3 > 14: 66 0f 28 c8 movapd %xmm0, %xmm1 > 18: 66 0f 54 da andpd %xmm2, %xmm3 > 1c: 66 0f 2e e3 ucomisd %xmm3, %xmm4 > 20: 76 2b jbe4d <_ceil+0x4d> > 22: f2 48 0f 2c c0 cvttsd2si %xmm0, %rax > 27: 66 0f ef db pxor %xmm3, %xmm3 > 2b: f2 0f 10 25 20 00 00 00 movsd 0x20(%rip), %xmm4 > 2f: R_X86_64_PC32 .rdata > 33: 66 0f 55 d1 andnpd %xmm1, %xmm2 > 37: f2 48 0f 2a d8 cvtsi2sd %rax, %xmm3 > 3c: f2 0f c2 c3 06 cmpnlesd %xmm3, %xmm0 > 41: 66 0f 54 c4 andpd %xmm4, %xmm0 > 45: f2 0f 58 c3 addsd %xmm3, %xmm0 > 49: 66 0f 56 c2 orpd %xmm2, %xmm0 > 4d: c3 retq > > .rdata > .align 8 >0: 00 00 00 00 .LC0: .quad 0x1.0p52 > 00 00 30 43 > 00 00 00 00 > 00 00 00 00 > .align 16 > 10: ff ff ff ff .LC1: .quad ~(-0.0) > ff ff ff 7f > 18: 00 00 00 00 .quad 0.0 > 00 00 00 00 > .align 8 > 20: 00 00 00 00 .LC2: .quad 0x1.0p0 > 00 00 f0 3f > 00 00 00 00 > 00 00 00 00 > .end > > JFTR: in the best case, the memory accesses cost several cycles, > while in the worst case they yield a page fault! > > > Properly optimized, faster and shorter code, using just 15 instructions > in 65 bytes, WITHOUT superfluous constants, thus avoiding costly memory > accesses and saving at least 32 bytes, follows: > > .intel_syntax > .equBIAS, 1023 > .text >0: f2 48 0f 2c c0cvttsd2si rax, xmm0 # rax = trunc(argument) >5: 48 f7 d8 neg rax > # jz .L0 # argument zero? >8: 70 36 jo .L0 # argument indefinite? ># argument overflows > 64-bit integer? >a: 48 f7 d8 neg rax >d: f2 48 0f 2a c8cvtsi2sd xmm1, rax # xmm1 = trunc(argument) > 12: 48 a1 00 00 00mov rax, BIAS << 52 > 19: 00 00 00 f0 3f > 1c: 66 48 0f 6e d0movqxmm2, rax# xmm2 = 0x1.0p0 > 21: f2 0f 10 d8 movsd xmm3, xmm0 # xmm3 = argument > 25: f2 0f c2 d9 02cmplesd xmm3, xmm1 # xmm3 = (argument <= > trunc(argument)) ? ~0L : 0L > 2a: 66 0f 55 da andnpd xmm3, xmm2 # xmm3 = (argument <= > trunc(argument)) ? 0.0 : 1.0 > 2e: f2 0f 58 d9 addsd xmm3, xmm1 # xmm3 = (argument > > trunc(argument)) ? 1.0 : 0.0 ># + trunc(argument) ># = ceil(argument) > 32: 66 0f 73 d0 3fpsrlq xmm0, 63 > 37: 66 0f 73 f0 3fpsllq xmm0, 63 # xmm0 = (argument & -0.0) > ? -0.0 : 0.0 > 3c: 66 0f 56 c3 orpdxmm0, xmm3 # xmm0 = ceil(argument) > 40: c3 .L0: ret > .end > > regards > Stefan -- BR, Hongtao
Re: Why vectorization didn't turn on by -O2
On Thu, Aug 5, 2021 at 5:20 AM Segher Boessenkool wrote: > > On Wed, Aug 04, 2021 at 11:22:53AM +0100, Richard Sandiford wrote: > > Segher Boessenkool writes: > > > On Wed, Aug 04, 2021 at 10:10:36AM +0100, Richard Sandiford wrote: > > >> Richard Biener writes: > > >> > Alternatively only enable loop vectorization at -O2 (the above checks > > >> > flag_tree_slp_vectorize as well). At least the cost model kind > > >> > does not have any influence on BB vectorization, that is, we get the > > >> > same pros and cons as we do for -O3. > > >> > > >> Yeah, but a lot of the loop vector cost model choice is about controlling > > >> code size growth and avoiding excessive runtime versioning tests. > > > > > > Both of those depend a lot on the target, and target-specific conditions > > > as well (which CPU model is selected for example). Can we factor that > > > in somehow? Maybe we need some target hook that returns the expected > > > percentage code growth for vectorising a given loop, for example, and > > > -O2 vs. -O3 then selects what percentage is acceptable. > > > > > >> BB SLP > > >> should be a win on both code size and performance (barring significant > > >> target costing issues). > > > > > > Yeah -- but this could use a similar hook as well (just a straightline > > > piece of code instead of a loop). > > > > I think anything like that should be driven by motivating use cases. > > It's not something that we can easily decide in the abstract. > > > > The results so far with using very-cheap at -O2 have been promising, > > so I don't think new hooks should block that becoming the default. > > Right, but it wouldn't hurt to think a sec if we are on the right path > forward. It's is crystal clear that to make good decisions about what > and how to vectorise you need to take *some* target characteristics into > account, and that will have to happen sooner rather than later. > > This was all in reply to > > > >> Yeah, but a lot of the loop vector cost model choice is about controlling > > >> code size growth and avoiding excessive runtime versioning tests. > > It was not meant to hold up these patches :-) > > > >> PR100089 was an exception because we ended up keeping unvectorised > > >> scalar code that would never have existed otherwise. BB SLP proper > > >> shouldn't have that problem. > > > > > > It also is a tiny piece of code. There will always be tiny examples > > > that are much worse (or much better) than average. > > > > Yeah, what makes PR100089 important isn't IMO the test itself, but the > > underlying problem that the PR exposed. Enabling this “BB SLP in loop > > vectorisation” code can lead to the generation of scalar COND_EXPRs even > > though we know that ifcvt doesn't have a proper cost model for deciding > > whether scalar COND_EXPRs are a win. > > > > Introducing scalar COND_EXPRs at -O3 is arguably an acceptable risk > > (although still dubious), but I think it's something we need to avoid > > for -O2, even if that means losing the optimisation. > > Yeah -- -O2 should almost always do the right thing, while -O3 can do > bad things more often, it just has to be better "on average". > > > Segher Move thread to gcc-patches and gcc -- BR, Hongtao
Re: Enable the vectorizer at -O2 for GCC 12
On Tue, Aug 31, 2021 at 11:11 AM Kewen.Lin via Gcc wrote: > > on 2021/8/30 下午10:11, Bill Schmidt wrote: > > On 8/30/21 8:04 AM, Florian Weimer wrote: > >> There has been a discussion, both off-list and on the gcc-help mailing > >> list (“Why vectorization didn't turn on by -O2”, spread across several > >> months), about enabling the auto-vectorizer at -O2, similar to what > >> Clang does. > >> > >> I think the review concluded that the very cheap cost model should be > >> used for that. > >> > >> Are there any remaining blockers? > > > > Hi Florian, > > > > I don't think I'd characterize it as having blockers, but we are continuing > > to investigate small performance issues that arise with very-cheap, > > including some things that regressed in GCC 12. Kewen Lin is leading that > > effort. Kewen, do you feel we have any major remaining concerns with this > > plan? > > > > Hi Florian & Bill, > > There are some small performance issues like PR101944 and PR102054, and > still two degraded bmks (P9 520.omnetpp_r -2.41% and P8 526.blender_r > -1.31%) to be investigated/clarified, but since their performance numbers > with separated loop and slp vectorization options look neutral, they are > very likely noises. IMHO I don't think they are/will be blockers. > > So I think it's good to turn this on by default for Power. The intel side is also willing to enable O2 vectorization after measuring performance impact for SPEC2017 and eembc. Meanwhile we are investigating PR101908/PR101909/PR101910/PR92740 which are reported O2 vectorization regresses extra benchmarks on znver and kabylake. > > BR, > Kewen -- BR, Hongtao
Re: Enable the vectorizer at -O2 for GCC 12
On Wed, Sep 1, 2021 at 7:24 PM Tamar Christina via Gcc wrote: > > -- edit, added list back in -- > > Just to add some AArch64 numbers for Spec2017 we see 2.1% overall Geomean > improvements (all from x264 as expected) with no real regressions (everything > within variance) and only a 0.06% binary size increase overall (of which x264 > grew 0.15%) using the very cheap cost model. > > So we'd be quite keen on this as well. > > Cheers, > Tamar > > > -Original Message- > > From: Gcc On Behalf > > Of Florian Weimer via Gcc > > Sent: Monday, August 30, 2021 2:05 PM > > To: gcc@gcc.gnu.org > > Cc: ja...@redhat.com; Richard Earnshaw ; > > Segher Boessenkool ; Richard Sandiford > > ; premachandra.malla...@amd.com; > > Hongtao Liu > > Subject: Enable the vectorizer at -O2 for GCC 12 > > > > There has been a discussion, both off-list and on the gcc-help mailing list > > (“Why vectorization didn't turn on by -O2”, spread across several months), > > about enabling the auto-vectorizer at -O2, similar to what Clang does. > > > > I think the review concluded that the very cheap cost model should be used > > for that. > > > > Are there any remaining blockers? > > > > Thanks, > > Florian > A patch is posted at [1] to enable auto-vectorization at O2 w/ very-cheap cost mode. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578877.html -- BR, Hongtao
RE: GCC/OpenMP offloading for Intel GPUs?
I got some feedback from my colleague - What we need from GCC 1. generate SPIR-V 2. offload bundler to create FAT object -- If the answer is yes for both, they can hook it up with libomptarget library and our IGC back-end. >-Original Message- >From: Thomas Schwinge >Sent: Wednesday, September 15, 2021 12:57 AM >To: gcc@gcc.gnu.org >Cc: Jakub Jelinek ; Tobias Burnus >; Kirill Yukhin ; Liu, >Hongtao >Subject: GCC/OpenMP offloading for Intel GPUs? > >Hi! > >I've had a person ask about GCC/OpenMP offloading for Intel GPUs (the new >ones, not MIC, obviously), to complement the existing support for Nvidia and >AMD GPUs. Is there any statement other than "ought to be doable; someone >needs to contribute the work"? > > >Grüße > Thomas >- >Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, >80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: >Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; >Registergericht München, HRB 106955
RE: GCC/OpenMP offloading for Intel GPUs?
Rely from Xinmin and adding him to this thead. IGC is open sourced. It takes SPIR-V IR and LLVM IR. We need "GCC IR to SPIR-V translator" similar to "LLVM-IR to SPIR-V translator" we have for LLVM-IR. How does GCC support device library? >-Original Message- >From: Thomas Schwinge >Sent: Wednesday, September 15, 2021 7:20 PM >To: Liu, Hongtao >Cc: gcc@gcc.gnu.org; Jakub Jelinek ; Tobias Burnus >; Kirill Yukhin ; Richard >Biener >Subject: RE: GCC/OpenMP offloading for Intel GPUs? > >Hi! > >On 2021-09-15T02:00:33+, "Liu, Hongtao via Gcc" >wrote: >> I got some feedback from my colleague > >Thanks for reaching out to them. > >> - >> What we need from GCC >> >> 1. generate SPIR-V >> 2. offload bundler to create FAT object >> -- >> >> If the answer is yes for both, they can hook it up with libomptarget library >and our IGC back-end. > >OK, I didn't remember Intel's use of SPIR-V as intermediate representation >(but that's certainly good!), and leaving aside the technical/implementation >issues (regarding libomptarget etc. use, as brought up by Jakub), the question >then is: are Intel planning to do that work (themselves, like for Intel MIC >offloading back then), or interested in hiring someone to do it, or not? > > >Grüße > Thomas > > >>>-----Original Message- >>>From: Thomas Schwinge >>>Sent: Wednesday, September 15, 2021 12:57 AM >>>To: gcc@gcc.gnu.org >>>Cc: Jakub Jelinek ; Tobias Burnus >>>; Kirill Yukhin ; >>>Liu, Hongtao >>>Subject: GCC/OpenMP offloading for Intel GPUs? >>> >>>Hi! >>> >>>I've had a person ask about GCC/OpenMP offloading for Intel GPUs (the >>>new ones, not MIC, obviously), to complement the existing support for >>>Nvidia and AMD GPUs. Is there any statement other than "ought to be >>>doable; someone needs to contribute the work"? >>> >>> >>>Grüße >>> Thomas >>>- >>>Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße >>>201, >>>80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: >>>Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; >>>Registergericht München, HRB 106955 >- >Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, >80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: >Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; >Registergericht München, HRB 106955
Re: _Float16-related failures on x86_64-apple-darwin
gcc define __FLT_EVAL_METHOD__ according to builtin_define_with_int_value ("__FLT_EVAL_METHOD__", c_flt_eval_method (true)); and guess we need to handle things like: /* GCC only supports one interchange type right now, _Float16. If we're evaluating _Float16 in 16-bit precision, then flt_eval_method will be FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16. */ + if (x == FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 + && x == y) +return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT; if (x == FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16) return y; I'm testing the patch but still need approval from related MAINTAINERs. On Fri, Dec 24, 2021 at 7:15 AM FX via Gcc wrote: > > > I’m not sure what the fix should be, either. We could use fixinclude to > > make the darwin headers happy, but we don’t really have a macro to provide > > the right value. Like a __FLT_EVAL_METHOD_OLDSTYLE__ macro. > > > > What should be the float_t and double_t types for FLT_EVAL_METHOD == 16? > > float and double, if I understand right? > > This is one possibility, assuming I am right about the types: > > diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def > index 46e3b8c993a..bea85ef7367 100644 > --- a/fixincludes/inclhack.def > +++ b/fixincludes/inclhack.def > @@ -1767,6 +1767,18 @@ fix = { > test_text = ""; /* Don't provide this for wrap fixes. */ > }; > > +/* The darwin headers don't accept __FLT_EVAL_METHOD__ == 16. > +*/ > +fix = { > +hackname = darwin_flt_eval_method; > +mach = "*-*-darwin*"; > +files = math.h; > +select= "^#if __FLT_EVAL_METHOD__ == 0$"; > +c_fix = format; > +c_fix_arg = "#if __FLT_EVAL_METHOD__ == 0 || __FLT_EVAL_METHOD__ == 16"; > +test_text = "#if __FLT_EVAL_METHOD__ == 0"; > +}; > + > /* > * Fix on Digital UNIX V4.0: > * It contains a prototype for a DEC C internal asm() function, > > > Sucks to have to fix headers… and we certainly can’t fix people’s code that > may depend on __FLT_EVAL_METHOD__ have well-defined values. So not convinced > this is the right approach. > > FX -- BR, Hongtao
Re: [Intel SPR] Progress of GCC support for Intel SPR features
On Mon, Feb 7, 2022 at 11:16 AM LiYancheng via Gcc wrote: > > > On 2022/2/7 10:03, Andrew Pinski wrote: > > On Sun, Feb 6, 2022 at 5:59 PM LiYancheng via Gcc wrote: > >> Hello everyone! > >> > >> I have some questions to ask: > >> > >> 1. How does GCC support Sapphrie Rapids CPU now? > >> > >> 2. Does GCC 11 fully support all the features of SPR? > >> From the release note, it seems that 5g ISA (fp16)/hfni is > >> not supported yet. > > It will be included in GCC 12 which should be released in less than 4 > > months. > Thank you for your reply! > >> 3. What is the simulation tool used by GCC to verify SPR characteristics? > >> Is it open source? > > Intel is doing the patching to GCC and binutils so I suspect they > > verify using their internal tools and I highly doubt it is free > > source. > > > > > > Thanks, > > Andrew Pinski > > > Any suggestions from Intel? > You can use Intel SDE(software-development-emulator) refer to https://www.intel.com/content/www/us/en/developer/articles/tool/software-development-emulator.html. And please use GCC12(main trunk, not released yet), and binutils 2.38(main trunk, not released yet). > Thanks! > > yancheng > > >> Thanks for all the help, > >> > >> yancheng > >> -- BR, Hongtao
Re: x86: making better use of vpternlog{d,q}
On Wed, May 24, 2023 at 3:58 PM Jan Beulich via Gcc wrote: > > Hello, > > for a couple of years I was meaning to extend the use of these AVX512F > insns beyond the pretty minimalistic ones there are so far. Now that I've > got around to at least draft something, I ran into a couple of issues I > cannot explain. I'd like to start with understanding the unexpected > effects of a change to an existing insn I have made (reproduced at the > bottom). I certainly was prepared to observe testsuite failures, but it > ends up failing tests I didn't expect it would fail, and - upon looking > at sibling ones - also ends up leaving intact tests which I would expect > would then need adjustment (because of using the new alternative). > > In particular (all mentioned tests are in gcc.target/i386/) > - avx512f-andn-si-zmm-1.c (and its AVX512VL counterparts) fails because > for whatever reason generated code reverts back to using vpbroadcastd, > - avx512f-andn-di-zmm-1.c, otoh, is unaffected (i.e. continues to use > vpandnq with embedded broadcast), > - avx512f-andn-si-zmm-2.c doesn't use the new 4th insn alternative when > at the same time a made-up DI variant of the test (akin to what might > be an avx512f-andn-di-zmm-2.c testcase) does. > IOW: How is SI mode element size different here from DI mode one? Is > there anything wrong with the 4th alternative I'm adding, or is this > hinting at some anomaly elsewhere? __m512i is defined as __v8di, when it's used for _mm512_andnot_epi32, it's explicitlt converted to (__v16si) and creates an extra subreg which is not needed for DImode cases. And pass_combine try to match the below pattern but failed due to the condition REG_P (operands[1]) || REG_P (operands[2]). Here I think you want register_operand instead of REG_P. 157(set (reg:V16SI 91) 158(and:V16SI (not:V16SI (subreg:V16SI (reg:V8DI 98) 0)) 159(vec_duplicate:V16SI (mem:SI (reg:DI 99) [1 *f_3(D)+0 S4 A32] > > Just to mention it, avx512f-andn-si-zmm-5.c similarly fails > unexpectedly, but I guess for the same reason (and there aren't AVX512VL > or DI mode element counterparts thereof). > > Jan > > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17019,11 +17019,11 @@ >"TARGET_AVX512F") > > (define_insn "*andnot3" > - [(set (match_operand:VI 0 "register_operand" "=x,x,v") > + [(set (match_operand:VI 0 "register_operand" "=x,x,v,v") > (and:VI > - (not:VI (match_operand:VI 1 "vector_operand" "0,x,v")) > - (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr")))] > - "TARGET_SSE" > + (not:VI (match_operand:VI 1 "bcst_vector_operand" "0,x,v,mBr")) > + (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr,v")))] > + "TARGET_SSE && (REG_P (operands[1]) || REG_P (operands[2]))" > { >char buf[64]; >const char *ops; > @@ -17090,6 +17090,11 @@ > case 2: >ops = "v%s%s\t{%%2, %%1, %%0|%%0, %%1, %%2}"; >break; > +case 3: > + tmp = "pternlog"; > + ssesuffix = ""; > + ops = "v%s%s\t{$0x44, %%1, %%2, %%0|%%0, %%2, %%1, $0x44}"; > + break; > default: >gcc_unreachable (); > } > @@ -17098,7 +17103,7 @@ >output_asm_insn (buf, operands); >return ""; > } > - [(set_attr "isa" "noavx,avx,avx") > + [(set_attr "isa" "noavx,avx,avx,avx512f") > (set_attr "type" "sselog") > (set (attr "prefix_data16") > (if_then_else > @@ -17106,7 +17111,7 @@ > (eq_attr "mode" "TI")) > (const_string "1") > (const_string "*"))) > - (set_attr "prefix" "orig,vex,evex") > + (set_attr "prefix" "orig,vex,evex,evex") > (set (attr "mode") > (cond [(match_test "TARGET_AVX2") > (const_string "") > @@ -17119,7 +17124,11 @@ > (match_test "optimize_function_for_size_p (cfun)")) > (const_string "V4SF") > ] > - (const_string "")))]) > + (const_string ""))) > + (set (attr "enabled") > + (if_then_else (eq_attr "alternative" "3") > + (symbol_ref " == 64 ? TARGET_AVX512F : > TARGET_AVX512VL") > + (const_string "*")))]) > > ;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn > (define_split -- BR, Hongtao
Re: /home/toon/compilers/gcc/libgfortran/generated/matmul_i1.c:1781:1: internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have 'w' (rtx const_int) in vpternlog_redundant_operand_mask,
On Mon, Aug 7, 2023 at 2:08 AM Toon Moene wrote: > > Wonder if I am the only one to see this: > > https://gcc.gnu.org/pipermail/gcc-testresults/2023-August/792616.html > > To quote: > > during RTL pass: split1 > /home/toon/compilers/gcc/libgfortran/generated/matmul_i1.c: In function > 'matmul_i1_avx512f': > /home/toon/compilers/gcc/libgfortran/generated/matmul_i1.c:1781:1: > internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have > 'w' (rtx const_int) in vpternlog_redundant_operand_mask, at > config/i386/i386.cc:19460 > 1781 | } >| ^ > during RTL pass: split1 > /home/toon/compilers/gcc/libgfortran/generated/matmul_i2.c: In function > 'matmul_i2_avx512f': > /home/toon/compilers/gcc/libgfortran/generated/matmul_i2.c:1781:1: > internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have > 'w' (rtx const_int) in vpternlog_redundant_operand_mask, at > config/i386/i386.cc:19460 > 1781 | } >| ^ > 0x7a5cc7 rtl_check_failed_type2(rtx_def const*, int, int, int, char > const*, int, char const*) > /home/toon/compilers/gcc/gcc/rtl.cc:761 > 0x82bf8d vpternlog_redundant_operand_mask(rtx_def**) > /home/toon/compilers/gcc/gcc/config/i386/i386.cc:19460 > 0x1f1295b split_44 > /home/toon/compilers/gcc/gcc/config/i386/sse.md:12730 > 0x1f1295b split_63 > /home/toon/compilers/gcc/gcc/config/i386/sse.md:28428 > 0xe7663b try_split(rtx_def*, rtx_insn*, int) > /home/toon/compilers/gcc/gcc/emit-rtl.cc:3800 > 0xe76cff try_split(rtx_def*, rtx_insn*, int) > /home/toon/compilers/gcc/gcc/emit-rtl.cc:3972 > 0x11b2938 split_insn > /home/toon/compilers/gcc/gcc/recog.cc:3385 > 0x11b2eff split_all_insns() > /home/toon/compilers/gcc/gcc/recog.cc:3489 > 0x11dd9c8 execute > /home/toon/compilers/gcc/gcc/recog.cc:4413 > Please submit a full bug report, with preprocessed source (by using > -freport-bug). > Please include the complete backtrace with any bug report. > See <https://gcc.gnu.org/bugs/> for instructions. > make[3]: *** [Makefile:4584: matmul_i1.lo] Error 1 > make[3]: *** Waiting for unfinished jobs > > -- > Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands Looks like related to https://gcc.gnu.org/g:567d06bb357a39ece865cef67ada44124f227e45 commit r14-2999-g567d06bb357a39ece865cef67ada44124f227e45 Author: Yan Simonaytes Date: Tue Jul 25 20:43:19 2023 +0300 i386: eliminate redundant operands of VPTERNLOG As mentioned in PR 110202, GCC may be presented with input where control word of the VPTERNLOG intrinsic implies that some of its operands do not affect the result. In that case, we can eliminate redundant operands of the instruction by substituting any other operand in their place. This removes false dependencies. -- BR, Hongtao
Re: /home/toon/compilers/gcc/libgfortran/generated/matmul_i1.c:1781:1: internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have 'w' (rtx const_int) in vpternlog_redundant_operand_mask,
On Mon, Aug 7, 2023 at 9:35 AM Hongtao Liu wrote: > > On Mon, Aug 7, 2023 at 2:08 AM Toon Moene wrote: > > > > Wonder if I am the only one to see this: > > > > https://gcc.gnu.org/pipermail/gcc-testresults/2023-August/792616.html Could you share your GCC configure, I guess --enable-checking=yes,rtl,extra is key to reproduce the issue? > > > > To quote: > > > > during RTL pass: split1 > > /home/toon/compilers/gcc/libgfortran/generated/matmul_i1.c: In function > > 'matmul_i1_avx512f': > > /home/toon/compilers/gcc/libgfortran/generated/matmul_i1.c:1781:1: > > internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have > > 'w' (rtx const_int) in vpternlog_redundant_operand_mask, at > > config/i386/i386.cc:19460 > > 1781 | } > >| ^ > > during RTL pass: split1 > > /home/toon/compilers/gcc/libgfortran/generated/matmul_i2.c: In function > > 'matmul_i2_avx512f': > > /home/toon/compilers/gcc/libgfortran/generated/matmul_i2.c:1781:1: > > internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have > > 'w' (rtx const_int) in vpternlog_redundant_operand_mask, at > > config/i386/i386.cc:19460 > > 1781 | } > >| ^ > > 0x7a5cc7 rtl_check_failed_type2(rtx_def const*, int, int, int, char > > const*, int, char const*) > > /home/toon/compilers/gcc/gcc/rtl.cc:761 > > 0x82bf8d vpternlog_redundant_operand_mask(rtx_def**) > > /home/toon/compilers/gcc/gcc/config/i386/i386.cc:19460 > > 0x1f1295b split_44 > > /home/toon/compilers/gcc/gcc/config/i386/sse.md:12730 > > 0x1f1295b split_63 > > /home/toon/compilers/gcc/gcc/config/i386/sse.md:28428 > > 0xe7663b try_split(rtx_def*, rtx_insn*, int) > > /home/toon/compilers/gcc/gcc/emit-rtl.cc:3800 > > 0xe76cff try_split(rtx_def*, rtx_insn*, int) > > /home/toon/compilers/gcc/gcc/emit-rtl.cc:3972 > > 0x11b2938 split_insn > > /home/toon/compilers/gcc/gcc/recog.cc:3385 > > 0x11b2eff split_all_insns() > > /home/toon/compilers/gcc/gcc/recog.cc:3489 > > 0x11dd9c8 execute > > /home/toon/compilers/gcc/gcc/recog.cc:4413 > > Please submit a full bug report, with preprocessed source (by using > > -freport-bug). > > Please include the complete backtrace with any bug report. > > See <https://gcc.gnu.org/bugs/> for instructions. > > make[3]: *** [Makefile:4584: matmul_i1.lo] Error 1 > > make[3]: *** Waiting for unfinished jobs > > > > -- > > Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 > > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > > Looks like related to > > https://gcc.gnu.org/g:567d06bb357a39ece865cef67ada44124f227e45 > > commit r14-2999-g567d06bb357a39ece865cef67ada44124f227e45 > Author: Yan Simonaytes > Date: Tue Jul 25 20:43:19 2023 +0300 > > i386: eliminate redundant operands of VPTERNLOG > > As mentioned in PR 110202, GCC may be presented with input where control > word of the VPTERNLOG intrinsic implies that some of its operands do not > affect the result. In that case, we can eliminate redundant operands > of the instruction by substituting any other operand in their place. > This removes false dependencies. > > > -- > BR, > Hongtao -- BR, Hongtao
Re: /home/toon/compilers/gcc/libgfortran/generated/matmul_i1.c:1781:1: internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have 'w' (rtx const_int) in vpternlog_redundant_operand_mask,
On Mon, Aug 7, 2023 at 9:38 AM Hongtao Liu wrote: > > On Mon, Aug 7, 2023 at 9:35 AM Hongtao Liu wrote: > > > > On Mon, Aug 7, 2023 at 2:08 AM Toon Moene wrote: > > > > > > Wonder if I am the only one to see this: > > > > > > https://gcc.gnu.org/pipermail/gcc-testresults/2023-August/792616.html > Could you share your GCC configure, I guess > --enable-checking=yes,rtl,extra is key to reproduce the issue? Reproduce with --with-cpu=native --with-arch=native ---enable-checking=yes,rtl,extra on an AVX512 machine. So on non-avx512 machine --with-cpu=cascadelake --with-arch=cascadelake --enable-checking=yes,rtl,extra should be enough. > > > > > > To quote: > > > > > > during RTL pass: split1 > > > /home/toon/compilers/gcc/libgfortran/generated/matmul_i1.c: In function > > > 'matmul_i1_avx512f': > > > /home/toon/compilers/gcc/libgfortran/generated/matmul_i1.c:1781:1: > > > internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have > > > 'w' (rtx const_int) in vpternlog_redundant_operand_mask, at > > > config/i386/i386.cc:19460 > > > 1781 | } > > >| ^ > > > during RTL pass: split1 > > > /home/toon/compilers/gcc/libgfortran/generated/matmul_i2.c: In function > > > 'matmul_i2_avx512f': > > > /home/toon/compilers/gcc/libgfortran/generated/matmul_i2.c:1781:1: > > > internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have > > > 'w' (rtx const_int) in vpternlog_redundant_operand_mask, at > > > config/i386/i386.cc:19460 > > > 1781 | } > > >| ^ > > > 0x7a5cc7 rtl_check_failed_type2(rtx_def const*, int, int, int, char > > > const*, int, char const*) > > > /home/toon/compilers/gcc/gcc/rtl.cc:761 > > > 0x82bf8d vpternlog_redundant_operand_mask(rtx_def**) > > > /home/toon/compilers/gcc/gcc/config/i386/i386.cc:19460 > > > 0x1f1295b split_44 > > > /home/toon/compilers/gcc/gcc/config/i386/sse.md:12730 > > > 0x1f1295b split_63 > > > /home/toon/compilers/gcc/gcc/config/i386/sse.md:28428 > > > 0xe7663b try_split(rtx_def*, rtx_insn*, int) > > > /home/toon/compilers/gcc/gcc/emit-rtl.cc:3800 > > > 0xe76cff try_split(rtx_def*, rtx_insn*, int) > > > /home/toon/compilers/gcc/gcc/emit-rtl.cc:3972 > > > 0x11b2938 split_insn > > > /home/toon/compilers/gcc/gcc/recog.cc:3385 > > > 0x11b2eff split_all_insns() > > > /home/toon/compilers/gcc/gcc/recog.cc:3489 > > > 0x11dd9c8 execute > > > /home/toon/compilers/gcc/gcc/recog.cc:4413 > > > Please submit a full bug report, with preprocessed source (by using > > > -freport-bug). > > > Please include the complete backtrace with any bug report. > > > See <https://gcc.gnu.org/bugs/> for instructions. > > > make[3]: *** [Makefile:4584: matmul_i1.lo] Error 1 > > > make[3]: *** Waiting for unfinished jobs > > > > > > -- > > > Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 > > > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > > > > Looks like related to > > > > https://gcc.gnu.org/g:567d06bb357a39ece865cef67ada44124f227e45 > > > > commit r14-2999-g567d06bb357a39ece865cef67ada44124f227e45 > > Author: Yan Simonaytes > > Date: Tue Jul 25 20:43:19 2023 +0300 > > > > i386: eliminate redundant operands of VPTERNLOG > > > > As mentioned in PR 110202, GCC may be presented with input where control > > word of the VPTERNLOG intrinsic implies that some of its operands do not > > affect the result. In that case, we can eliminate redundant operands > > of the instruction by substituting any other operand in their place. > > This removes false dependencies. > > > > > > -- > > BR, > > Hongtao > > > > -- > BR, > Hongtao -- BR, Hongtao
Re: Question about generating vpmovzxbd instruction without using the interfaces in immintrin.h
On Fri, May 31, 2024 at 10:58 AM Hanke Zhang via Gcc wrote: > > Hi, > I've recently been trying to hand-write code to trigger automatic > vectorization optimizations in GCC on Intel x86 machines (without > using the interfaces in immintrin.h), but I'm running into a problem > where I can't seem to get the concise `vpmovzxbd` or similar > instructions. > > My requirement is to convert 8 `uint8_t` elements to `int32_t` type > and print the output. If I use the interface (_mm256_cvtepu8_epi32) in > immintrin.h, the code is as follows: > > int immintrin () { > int size = 1, offset = 3; > uint8_t* a = malloc(sizeof(char) * size); > > __v8si b = (__v8si)_mm256_cvtepu8_epi32(*(__m128i *)(a + offset)); > > for (int i = 0; i < 8; i++) { > printf("%d\n", b[i]); > } > } > > After compiling with -mavx2 -O3, you can get concise and efficient > instructions. (You can see it here: https://godbolt.org/z/8ojzdav47) > > But if I do not use this interface and instead use a for-loop or the > `__builtin_convertvector` interface provided by GCC, I cannot achieve > the above effect. The code is as follows: > > typedef uint8_t v8qiu __attribute__ ((__vector_size__ (8))); > int forloop () { > int size = 1, offset = 3; > uint8_t* a = malloc(sizeof(char) * size); > > v8qiu av = *(v8qiu *)(a + offset); > __v8si b = {}; > for (int i = 0; i < 8; i++) { > b[i] = (a + offset)[i]; > } > > for (int i = 0; i < 8; i++) { > printf("%d\n", b[i]); > } > } > > int builtin_cvt () { > int size = 1, offset = 3; > uint8_t* a = malloc(sizeof(char) * size); > > v8qiu av = *(v8qiu *)(a + offset); > __v8si b = __builtin_convertvector(av, __v8si); > > for (int i = 0; i < 8; i++) { > printf("%d\n", b[i]); > } > } > > The instructions generated by both functions are redundant and > complex, and are quite difficult to read compared to calling > `_mm256_cvtepu8_epi32` directly. (You can see it here as well: > https://godbolt.org/z/8ojzdav47) > > What I want to ask is: How should I write the source code to get > assembly instructions similar to directly calling > _mm256_cvtepu8_epi32? > > Or would it be easier if I modified the GIMPLE directly? But it seems > that there is no relevant expression or interface directly > corresponding to `vpmovzxbd` in GIMPLE. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652484.html We're working on the patch to optimize __builtin_convertvector, after that it can be as optimal as intel intrinsic. > > Thanks > Hanke Zhang -- BR, Hongtao