On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
<[email protected]> wrote:
>
> Hi Richard,
> For the following reduced test-case taken from PR:
>
> #include "arm_sve.h"
> svuint32_t l() {
> alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
> return svld1rq_u32(svptrue_b8(), lanes);
> }
>
> compiling with -O3 -mcpu=generic+sve results in following ICE:
> during GIMPLE pass: fre
> pr110280.c: In function 'l':
> pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> tree-ssa-sccvn.cc:6890
> 5 | }
> | ^
> 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> gimple_stmt_iterator*)
> ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> 0x1aeec77 dom_walker::walk(basic_block_def*)
> ../../gcc/gcc/domwalk.cc:311
> 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> 0x1214664 do_rpo_vn_1
> ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> 0x1215ba5 execute
> ../../gcc/gcc/tree-ssa-sccvn.cc:8702
>
> cc1 simplifies:
> lanes[0] = 0;
> lanes[1] = 0;
> lanes[2] = 0;
> lanes[3] = 0;
> _1 = { -1, ... };
> _7 = svld1rq_u32 (_1, &lanes);
>
> to:
> _9 = MEM <vector(4) unsigned int> [(unsigned int * {ref-all})&lanes];
> _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
>
> and then fre1 dump shows:
> Applying pattern match.pd:8675, generic-match-5.cc:9025
> Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to {
> 0, 0, 0, 0 }
> RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 }
>
> The issue seems to be with the following pattern:
> (simplify
> (vec_perm vec_same_elem_p@0 @0 @1)
> @0)
>
> which simplifies above VEC_PERM_EXPR to:
> _7 = {0, 0, 0, 0}
> which is incorrect since _9 and mask have different vector lengths.
>
> The attached patch amends the pattern to simplify above VEC_PERM_EXPR
> only if operand and mask have same number of elements, which seems to fix
> the issue, and we're left with the following in .optimized dump:
> <bb 2> [local count: 1073741824]:
> _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>;
it would be nice to have this optimized.
-
(simplify
(vec_perm vec_same_elem_p@0 @0 @1)
- @0)
+ (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
+ TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1))))
+ @0))
that looks good I think. Maybe even better use 'type' instead of TREE_TYPE (@1)
since that's more obviously the return type in which case
(if (types_match (type, TREE_TYPE (@0))
would be more to the point.
But can't you to simplify this in the !known_eq case do a simple
{ build_vector_from_val (type, the-element); }
? The 'vec_same_elem_p' predicate doesn't get you at the element,
(with { tree el = uniform_vector_p (@0); }
(if (el)
{ build_vector_from_val (type, el); })))
would be the cheapest workaround.
> return _2;
>
> code-gen:
> l:
> mov z0.b, #0
> ret
>
> Patch is bootstrapped+tested on aarch64-linux-gnu.
> OK to commit ?
>
> Thanks,
> Prathamesh