https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112722
Bug ID: 112722
Summary: RISC-V: ICE on tree-vect-slp.cc:8029 for
-march=rv64gc_zve64d
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: juzhe.zhong at rivai dot ai
Target Milestone: ---
Hi, Richards.
Recently testing expose an ICE for RVV.
Here is the case:
#define VECTOR_BITS 512
#define N (VECTOR_BITS * 11 / 64 + 4)
#define add(A, B) ((A) + (B))
#define DEF(OP) \
void __attribute__ ((noipa)) \
f_##OP (double *restrict a, double *restrict b, double x) \
{ \
for (int i = 0; i < N; i += 2) \
{ \
a[i] = b[i] < 100 ? OP (b[i], x) : b[i]; \
a[i + 1] = b[i + 1] < 70 ? OP (b[i + 1], x) : b[i + 1]; \
} \
}
#define TEST(OP) \
{ \
f_##OP (a, b, 10); \
_Pragma("GCC novector") \
for (int i = 0; i < N; ++i) \
{ \
int bval = (i % 17) * 10; \
int truev = OP (bval, 10); \
if (a[i] != (bval < (i & 1 ? 70 : 100) ? truev : bval)) \
__builtin_abort (); \
asm volatile ("" ::: "memory"); \
} \
}
#define FOR_EACH_OP(T) \
T (add) \
FOR_EACH_OP (DEF)
compile option:
-march=rv64gc_zve64d_zvfh_zfh -mabi=lp64d -fdiagnostics-plain-output -flto
-ffat-lto-objects -ftree-vectorize -fno-tree-loop-distribute-patterns
-fno-vect-cost-model -fno-common -O3 -fdump-tree-vect-details
https://godbolt.org/z/GT4bW4Tno
The reason why it ICE is because:
In tree-vect-slp.cc:8028:
unsigned int partial_nelts = nelts / nvectors;
nelts = 2, nvectors = 4, then partial_nelts = 0. So it ICE.
nelts = 2 looks reasonable since slp_instances.length () = 2.
nvectors is calculated by can_duplicate_and_interleave_p.
I dig into can_duplicate_and_interleave_p, it's hard to me to understand the
codes.
Here is the descriptions of can_duplicate_and_interleave_p:
for (;;)
{
...
if (int_mode_for_size (elt_bits, 1).exists (&int_mode))
{
if (vector_type
&& VECTOR_MODE_P (TYPE_MODE (vector_type))
&& known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
GET_MODE_SIZE (base_vector_mode))
&& multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
2, &half_nelts))
{
nvectors *= 2;
}
In the 1st round of the for loop, the "int_mode" = TImode.
Since RVV doesn't have vector TI mode. Then "nvectors" become 2.
In the second round, the "int_mode" = DImode.
RVV has vector DI mode which RVVM1DImode, the size = poly (1,1).
Since it doesn't satisfy the condtion
"multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),2, &half_nelts)"
Then it continue the loop and nvectors become 4.
In the third round (the last round). int_mode = SImode, then RVV has
RVVM1SImode
which has nunits = (2,2) then return true;
So the nvectors = 4, then *nvectors_out output 4. Then ICE.
I am struggling at fixing this ICE of RVV and failed to find a appropriate
approach to fix it.
Do we need to walk around in RISC-V backed (Disable all poly (1,1) mode
vectorization) ? I believe it may fix the issues but not sure.
Or it should be fixed in middle-end ?
Thanks.