Dhruv Chawla writes:
> On 08/05/25 18:43, Richard Sandiford wrote:
>> Otherwise it looks good. But I think we should think about how we
>> plan to integrate the related optimisation for register inputs. E.g.:
>>
>> int32x4_t foo(int32_t x) {
>> return vsetq_lane_s32(x, vdupq_n_s32(0), 0);
On 08/05/25 18:43, Richard Sandiford wrote:
External email: Use caution opening links or attachments
Dhruv Chawla writes:
This patch modifies Advanced SIMD assembly generation to emit an LDR
instruction when a vector is created using a load to the first element with the
other elements being z
Dhruv Chawla writes:
> This patch modifies Advanced SIMD assembly generation to emit an LDR
> instruction when a vector is created using a load to the first element with
> the
> other elements being zero.
>
> This is similar to what *aarch64_combinez already does.
>
> Example:
>
> uint8x16_t foo(
uint8_t, u8)
+LDR_NARROW (uint16x4_t, uint16_t, u16)
+LDR_NARROW (uint32x2_t, uint32_t, u32)
+LDR_NARROW (uint64x1_t, uint64_t, u64)
+
+LDR_NARROW (float16x4_t, float16_t, f16)
+LDR_NARROW (float32x2_t, float32_t, f32)
+LDR_NARROW (float64x1_t, float64_t, f64)
+
+LDR_NARROW (bfloat16x4_t, bfloat16_t, bf16
On Sun, Jan 5, 2025 at 10:06 PM Dhruv Chawla wrote:
>
> This patch modifies Advanced SIMD assembly generation to emit an LDR
> instruction when a vector is created using a load to the first element with
> the
> other elements being zero.
>
> This is similar to what *aarch64_combinez already does.
+
+LDR_NARROW (float16x4_t, float16_t, f16)
+LDR_NARROW (float32x2_t, float32_t, f32)
+LDR_NARROW (float64x1_t, float64_t, f64)
+
+LDR_NARROW (bfloat16x4_t, bfloat16_t, bf16)
+
+/* { dg-final { scan-assembler-times "\\tldr" 24 } } */
+/* { dg-final { scan-assembler-not "\\tmov" } } */
-