> from output_constant_pool_2 and make it defer to native_encode_rtx
> instead. That seems like the most direct way of avoiding mishaps.
>
> E.g. another way in which different routines could make different choices
> is in whether, for SVE's VNx8BI (say), they fill the upper bit of each
> 2-bit element with 0s or with a copy of the low bit. Both choices are
> valid in principle, and sharing the same code between both routines
> would make sure that they make the same choice.
Thanks. I went with your approach in the attached patch.
It was bootstrapped and regtested on x86 and power10, aarch64 is still
running. Regtested on rv64gcv_zvl512b.
Regards
Robin
[PATCH] varasm: Use native_encode_rtx for constant vectors.
optimize_constant_pool hashes vector masks by native_encode_rtx and
merges identically hashed values in the constant pool. Afterwards the
optimized values are written in output_constant_pool_2.
However, native_encode_rtx and output_constant_pool_2 disagree in their
encoding of vector masks: native_encode_rtx does not pad with zeroes
while output_constant_pool_2 implicitly does.
In RVV's shuffle-evenodd-run.c there are two masks
(a) "0101" for V4BI
(b) "01010101" for V8BI and
that have the same representation/encoding ("1010101") in native_encode_rtx.
output_constant_pool_2 uses "101" for (a) and "1010101" for (b).
Now, optimize_constant_pool might happen to merge both masks using
(a) as representative. Then, output_constant_pool_2 will output "1010"
which is only valid for the second mask as the implicit zero padding
doesn't agree with (b).
(b)'s "1010101" works for both masks as a V4BI load will ignore the last four
padding bits.
This patch makes output_constant_pool_2 use native_encode_rtx so both
functions will agree on an encoding and output the correct constant.
gcc/ChangeLog:
* varasm.cc (output_constant_pool_2): Use native_encode_rtx for
building the memory image of a const vector mask.
---
gcc/varasm.cc | 32 ++++++++++++--------------------
1 file changed, 12 insertions(+), 20 deletions(-)
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 0068ec2ce4d..584d1864077 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -4301,34 +4301,26 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x,
unsigned int align)
{
gcc_assert (GET_CODE (x) == CONST_VECTOR);
- /* Pick the smallest integer mode that contains at least one
- whole element. Often this is byte_mode and contains more
- than one element. */
+ auto_vec<target_unit, 128> buffer;
+ buffer.truncate (0);
+ buffer.reserve (GET_MODE_SIZE (mode));
+
+ bool ok = native_encode_rtx (mode, x, buffer, 0, GET_MODE_SIZE (mode));
+ gcc_assert (ok);
+
unsigned int nelts = GET_MODE_NUNITS (mode);
unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
- unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode));
- /* We allow GET_MODE_PRECISION (mode) <= GET_MODE_BITSIZE (mode) but
- only properly handle cases where the difference is less than a
- byte. */
- gcc_assert (GET_MODE_BITSIZE (mode) - GET_MODE_PRECISION (mode) <
- BITS_PER_UNIT);
-
- /* Build the constant up one integer at a time. */
- unsigned int elts_per_int = int_bits / elt_bits;
- for (unsigned int i = 0; i < nelts; i += elts_per_int)
+ for (unsigned i = 0;
+ i < GET_MODE_SIZE (mode) / GET_MODE_SIZE (int_mode);
+ i += GET_MODE_SIZE (int_mode))
{
unsigned HOST_WIDE_INT value = 0;
- unsigned int limit = MIN (nelts - i, elts_per_int);
- for (unsigned int j = 0; j < limit; ++j)
- {
- auto elt = INTVAL (CONST_VECTOR_ELT (x, i + j));
- value |= (elt & mask) << (j * elt_bits);
- }
+ memcpy (&value, buffer.address () + i, GET_MODE_SIZE (int_mode));
output_constant_pool_2 (int_mode, gen_int_mode (value, int_mode),
- i != 0 ? MIN (align, int_bits) : align);
+ align);
}
break;
}
--
2.47.1