On Wed, 2 Jul 2025 02:39:33 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:

>> ### Background
>> On AArch64, the minimum vector length supported is 64-bit for basic types, 
>> except for `byte` and `boolean` (32-bit and 16-bit respectively to match 
>> special Vector API features). This limitation prevents intrinsification of 
>> vector type conversions between `short` and wider types (e.g. `long/double`) 
>> in Vector API when the entire vector length is within 128 bits, resulting in 
>> degraded performance for such conversions.
>> 
>> For example, type conversions between `ShortVector.SPECIES_128` and 
>> `LongVector.SPECIES_128` are not supported on AArch64 NEON and SVE 
>> architectures with 128-bit max vector size. This occurs because the compiler 
>> would need to generate a vector with 2 short elements, resulting in a 32-bit 
>> vector size.
>> 
>> To intrinsify such type conversion APIs, we need to relax the min vector 
>> length constraint from 64-bit to 32-bit for short vectors.
>> 
>> ### Impact Analysis
>> #### 1. Vector types
>> Vectors only with `short` element types will be affected, as we just 
>> supported 32-bit `short` vectors in this change.
>> 
>> #### 2. Vector API
>> No impact on Vector API or the vector-specific nodes. The minimum vector 
>> shape at API level remains 64-bit. It's not possible to generate a final 
>> vector IR with 32-bit vector size. Type conversions may generate 
>> intermediate 32-bit vectors, but they will be resized or cast to vectors 
>> with at least 64-bit length.
>> 
>> #### 3. Auto-vectorization
>> Enables vectorization of cases containing only 2 `short` lanes, with 
>> significant performance improvements. Since we have supported 32-bit vectors 
>> for `byte` type for a long time, extending this to `short` did not introduce 
>> additional risks.
>> 
>> #### 4. Codegen of vector nodes
>> NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions 
>> instead. For lanewise operations, this is safe because the higher half bits 
>> can be ignored.
>> 
>> Details:
>>  - Lanewise vector operations are unaffected as explained above.
>>  - NEON supports vector load/store instructions with 32-bit vector size, 
>> which we already use in relevant IRs (shared by SVE).
>>  - Cross-lane operations like reduction may be affected, potentially causing 
>> incorrect results for `min/max/mul/and` reductions. The min vector size for 
>> such operations should remain 64-bit. We've added assertions in match rules. 
>> Since it's currently not possible to generate such reductions (Vector API 
>> minimum is 64-bit, and SLP doesn't support subword type reductions), we 
>> maintain the status quo. However, addin...
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refine comments based on review suggestion

src/hotspot/cpu/aarch64/aarch64.ad line 2367:

> 2365:   // Theoretically, the minimal vector length supported by AArch64
> 2366:   // ISA and Vector API species is 64-bit. However, 32-bit or 16-bit
> 2367:   // vector length is also allowed for special Vector API usages.

Suggestion:

  // Usually, the shortest vector length supported by AArch64
  // ISA and Vector API species is 64 bits. However, we allow
  // 32-bit or 16-bit vectors in a few special cases.


Reason for change: it wasn't clear what "supported" meant. Supported by the 
hardware, or by HotSpot. And why do we only support it in a few special cases? 
This comment raises more questions than it answers.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26057#discussion_r2179423549

Reply via email to