Re: [PATCH 2/2] Refined 256/512-bit vpacksswb/vpackssdw patterns.

2023-06-16 Thread Uros Bizjak via Gcc-patches
On Fri, Jun 16, 2023 at 4:12 AM liuhongt wrote: > > The packing in vpacksswb/vpackssdw is not a simple concat, it's an > interweave from src1 and src2 for every 128 bit(or 64-bit for the > ss_truncate result). > > .i.e. > > dst[192-255] = ss_truncate (src2[128-255]) > dst[128-191] = ss_truncate (s

[PATCH 2/2] Refined 256/512-bit vpacksswb/vpackssdw patterns.

2023-06-15 Thread liuhongt via Gcc-patches
The packing in vpacksswb/vpackssdw is not a simple concat, it's an interweave from src1 and src2 for every 128 bit(or 64-bit for the ss_truncate result). .i.e. dst[192-255] = ss_truncate (src2[128-255]) dst[128-191] = ss_truncate (src1[128-255]) dst[64-127] = ss_truncate (src2[0-127]) dst[0-63] =