On Thu, 3 Jul 2025 06:10:28 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:
>> ### Background >> On AArch64, the minimum vector length supported is 64-bit for basic types, >> except for `byte` and `boolean` (32-bit and 16-bit respectively to match >> special Vector API features). This limitation prevents intrinsification of >> vector type conversions between `short` and wider types (e.g. `long/double`) >> in Vector API when the entire vector length is within 128 bits, resulting in >> degraded performance for such conversions. >> >> For example, type conversions between `ShortVector.SPECIES_128` and >> `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE >> architectures with 128-bit max vector size. This occurs because the compiler >> would need to generate a vector with 2 short elements, resulting in a 32-bit >> vector size. >> >> To intrinsify such type conversion APIs, we need to relax the min vector >> length constraint from 64-bit to 32-bit for short vectors. >> >> ### Impact Analysis >> #### 1. Vector types >> Vectors only with `short` element types will be affected, as we just >> supported 32-bit `short` vectors in this change. >> >> #### 2. Vector API >> No impact on Vector API or the vector-specific nodes. The minimum vector >> shape at API level remains 64-bit. It's not possible to generate a final >> vector IR with 32-bit vector size. Type conversions may generate >> intermediate 32-bit vectors, but they will be resized or cast to vectors >> with at least 64-bit length. >> >> #### 3. Auto-vectorization >> Enables vectorization of cases containing only 2 `short` lanes, with >> significant performance improvements. Since we have supported 32-bit vectors >> for `byte` type for a long time, extending this to `short` did not introduce >> additional risks. >> >> #### 4. Codegen of vector nodes >> NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions >> instead. For lanewise operations, this is safe because the higher half bits >> can be ignored. >> >> Details: >> - Lanewise vector operations are unaffected as explained above. >> - NEON supports vector load/store instructions with 32-bit vector size, >> which we already use in relevant IRs (shared by SVE). >> - Cross-lane operations like reduction may be affected, potentially causing >> incorrect results for `min/max/mul/and` reductions. The min vector size for >> such operations should remain 64-bit. We've added assertions in match rules. >> Since it's currently not possible to generate such reductions (Vector API >> minimum is 64-bit, and SLP doesn't support subword type reductions), we >> maintain the status quo. However, addin... > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Refine the comment in ad file Have you measured the performance of this micro-benchmark on NEON machine? https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java#L251-L256 We added an limitation only for `int` before: https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L131-L134 Perhaps we also need to impose a similar limitation on `short` if the same regression occurs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3039090274